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INTRODUCTION 


Tie scope anp ORGANIZATION of this issue of the Review follow rather 
closely those of previous issues on this topic. All the chapter topics of 
the February 1941] issue are represented. In addition, two chapters deal 
with construction and application of psychological tests in the armed serv- 
ices and the measurement of psychoeducational growth. No justification 
is required for the inclusion of what could be had on use of tests in the 
armed services. The chapter on measurement of psychoeducational growth 
is brief and is organized separately to give place for discussion of prin- 
ciples and uses of tests that fall outside of discussion of psychological 
tests in this issue and discussion of educational tests in separate issues 
devoted to the various subjectmatter areas. The chapter order has been 
revised at one point to bring into juxtaposition the discussions of con- 
struction and application of personality tests, letting the discussion of 
the more definitely clinical instrumentalities for studying personality, the 
projective technics, follow and thus stand somewhat apart. 

The reader is advised to read both of a pair of related chapters, Chap- 
ters II and III on intelligence tests and Chapters V and VI on personality 
tests, if he is interested in the subject of either. The circumstances of 
producing this issue have made it more difficult than is inherently the 
case to insure that each chapter includes bibliographical references highly 
appropriate to its topic and less appropriately treated elsewhere. In the 
interest of reducing overlapping discussions and repetition of references 
in bibliographies, with few exceptions each reference appears only once. 
It may be assumed that, since all are fallible, at least a few errors of 
classification, in addition to enforced deletions, have resulted in related 
chapters being complete only when taken jointly. 

It is well to repeat and underline the statement of the chairman of the 
February 1941 issue that bibliographies of the various chapters are neces- 
sarily incomplete. As the volume of publications in this field increases 
and the space allotment in the Review remains constant, abridgment of 
bibliographies becomes essential lest a huge compendium of references 
crowd out even the descriptive discussion of the references. 

Partly, perhaps, because of this growing pressure, bibliographical ref- 
erences have been reduced approximately 10 percent from the previous 
issue and the discussions have taken on a more evaluative character than 
formerly. Perhaps, too, the lack of an outlet and occasion afforded by 
the Mental Measurements Yearbooks in the preceding period has made this 
issue seem to the reviewers the place for critical comments on psychological 
tests and their uses that would have been expressed there. Whatever the 
explanation, this issue involves more extended comment and criticism than 
previous issues on this topic and is presented in that spirit. 


Warren G. FINDLEY 
Chairman, Committee on Psychological Tests 
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CHAPTER I 


Brief Overview of the Period 


WARREN G. FINDLEY 


Thus issue of the Review on psychological tests and their uses follows 
the February 1941 issue on the same topic and occupies the same place 


in the new cycle. Generally, it covers the literature on this topic from July 
1940 to July 1943. 


General References 


Among the important general references on psychological tests and their 
uses that appeared during this period are two issues of the Review. The 
issue of December 1941 on “Growth and Development” contains four chap- 
ters on social, emotional, and intellectual development. The studies re- 
ported, especially the extended longitudinal ones, depend at significant 
points on technics of psychological testing, while at the same time they 
become the most significant validating procedures for those technics. The 
extensive bibliographies and discussions constitute an important supple- 
ment to the material presented here. The issue of December 1942 devoted 
to “Methods of Research and Appraisal in Education” includes several 
chapters on statistical methods employed in using psychological tests in 
experimental studies. It also includes general chapters like the one on 
evaluative studies that describe the framework within which psychological 
tests are used to appraise educational practices and outcomes. 

The Nineteen Forty Mental Measurements Yearbook (1) merits special 
mention for establishing a useful pattern for the future, when it is hoped 
that subsequent issues will be produced, as well as for its immediate values. 
The unique strength of this publication lies in the fact that it draws upon 
critics of all types of competence, curricular, philosophical, and technical, 
for reviews of the same testing instrument. Not only is the reader enabled 
to compare authorities, but the authorities themselves are constrained to 
make careful comments by the threat of comparison. In this respect, the 
Yearbook holds a significant advantage over the type of presentation in 
the Review in areas where they overlap. In the 1940 Yearbook Buros took 
a significant step forward by including critical reviews of old tests currently 
in wide use as well as reviews of new tests. Continuous reevaluation of 
older instruments brings them publicly into contact with developing thought 
about tests, while the older tests also serve as touchstones for evaluation 
of new offerings. 

Remmers and Gage (2) offer a significant new textbook on educational 
measurement and evaluation. Much research reviewed in this and previous 
issues of the Review on this topic is presented by them in a framework 
that takes into account the psychological unity and organic development 
of individual human beings. The program of evaluation recommended is 
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accordingly comprehensive in scope, continuous in its operation, construc- 
tive in its application to the individual, and unified in its design and inter- 
pretation. 

The separate chapters of this issue include those references that provide 
extended bibliographies or general surveys appropriate to each chapter. 


Survey of Recent Trends in Psychological Measurement 


In contrast to the preceding period, the past three years have produced 
little research literature on the construction or revision of so-called intelli- 
gence tests. The use to which such tests are being put in selection and 
classification of personnel for the armed services is tremendous, but war 
duties of those primarily responsible for their technical construction and 
evaluation preclude extensive publication of research reports. Accounts 
of new instruments for appraising competence in the higher mental proc- 
esses involved in concept formation have appeared in sufficient numbers 
to constitute a trend. Much of this research stems from a hope, sometimes 
explicit, that if accurate measures of accomplishment in these processes 
can be obtained it may be possible to improve the skill of individuals in 
the processes. A second significant trend is represented in several studies 
of the patterns of performance, as distinguished from gross achievement, 
of individuals given intelligence tests. Stimulated somewhat by recent appli- 
cation of factor analysis to test data of this sort, this trend reflects chiefly 
a wider acceptance among psychometricians of the need for more detailed 
understanding of individuals thru test performance. Several comparative 
studies of available individual tests of intelligence combine to show that 
differences in variability of the tests in range of IQ values make it wise 
to employ caution in dealing with IQ’s from different tests and to adjust 
IQ values mathematically in instances where the relative variabilities are 
known. ? 

Research involving applications of intelligence tests has included the 
usual substantial number of studies indicating the correlation between 
intelligence test scores and academic achievement and survival. The field 
of research reopened by the impact of the Iowa studies on the perennial 
question of the constancy of the IQ has produced a number of studies 
tending to clarify further the influence of community differences in stand- 
ards of living and educational opportunity on the mental effectiveness of 
individuals measured by intelligence tests. Other studies of the influence 
of the home, expressed in terms of socio-economic level and occupational 
status of parents, have shown greater insight and sophistication and have 
thereby extended our understanding of this influence. One large-scale study 
of pupils tested in the Cleveland public-school system throws into dramatic 
relief the dual problems of controlling selection in studies of IQ trends 
and agreeing upon a more satisfactory measure than the IQ for indicating 
whatever constancy resides in the determinations based on intelligence test 
scores. If low IQ’s tend to drop and high IQ’s tend to rise in ordinary 
situations, the claims of other indexes like the Heinis personal constant 





chan alert 


> pt a 4 te 











sy be eet Gin PA ee 


Aan SveN™ 


1 
4 





February 1944 BrieF OVERVIEW OF THE PERIOD 





need to be reexamined and some decision reached as to what is the most 
valid index of mental development. 

A rather substantial addition has been made to the bibliography on the 
prediction of professional success, especially by the Minnesota studies. 
The general conclusion is that aptitude tests may add to the accuracy of 
prediction based on previous academic achievement but the latter is quite 
consistently the more predictive of the two. Multiple correlations involving 
aptitude test scores and previous achievement as joint predictors tend to 
be about .7. The substantial number of studies on visual acuity, color 
vision, hearing, mechanical and manual dexterity, gross motor abilities, 
clerical aptitude, vocational selection, and driving and flying aptitude may 
be taken as symptomatic of research in those areas, which has been inten- 
sified by the war effort. 

A number of new personality inventories have appeared but few of these 
have been extensively evaluated. The Bernreuter Personality Inventory has 
continued to lead in number of studies, but little new has been added to 
the previous conclusion that such tests have slight validity even in situa- 
tions where the temptation to falsify is absent. Factor analysis has been 
applied extensively to items of personality tests and a tendency to reach 
more consistent interpretative patterns is evident. Research with interest 
inventories has included demonstrating that simplified scoring of the Strong 
Vocational Interest Blank yields valid data and some evidence that the 
Kuder Preference Record yields similar valid results. Studies of attitude 
scales, stemming chiefly from Thurstone and Remmers, have increased be- 
cause of interest in the effects of war and propaganda on attitudes, espe- 
cially toward war. Attitude tests seem to be validated as group measures, 
whatever their validity for individuals. 

The reviewers of applications of personality and character measurement 
in this issue are more hopeful than were the reviewers of this same area 
in 1941. The explanation is to be found in the scope of studies attempted 
and the insight shown by the researchers. Well-conceived studies of group 
attitudes on important social issues and of the effects of instruction and 
experience on such attitudes have been reported more frequently. At the 
same time, studies using attitude scales have been based on a viewpoint 
that sees individual personalities as unified within themselves and inter- 
active with social forces rather than as loosely organized carriers of traits 
subject behavioristically to influences from the environment. 

An even more substantial advance in interpretation of personality has 
come thru research with projective technics. The Rorschach Test has been 
studied so thoroly and applied so extensively as almost to merit a chapter 
in itself. Highlights of the last three years include publication of the 
manual by Klopfer and Kelley and the development of a method of group 
administration. Thematic apperception tests have gained almost equally, 
and Murray’s Thematic Apperception Test and the Rorschach are fre- 
quently mentioned jointly as “the two preferred methods for study of per- 
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sonality.” Play technics of personality diagnosis have been explored con- 
siderably, and a study of handwriting that promises to raise graphology 
to the status of scientific technic has appeared. 

The unity with which it is essential to view appraisal of psychological 
development is stressed in the chapter on measurement of psychoeduca- 
tional growth. Explicit recognition of the need for the “psychological use 
of tests” is cited from a number of publications. 

Several brief statements of progress in applying psychological tests in 
the various branches of the armed service are presented by authoritative 
representatives. In all these necessarily incomplete statements sufficient 
detail is given to make it clear that much is being done rapidly that will be 
available for report later and that what is being done has vast significance 
for education and educational research. 

The frequency of applications of factor analysis in the fields of psycho- 
logical testing has been matched by the frequency of evaluations and com- 
parisons of the various factor analysis technics. An important outcome 
has been a clarification of the similarities and differences among these 
technics. Sampling theory, which is more important in research involving 
applications of tests than in test building, has received considerable atten- 
tion which promises to develop a more complete rationale for the testing 
of hypotheses. Many technical aids have been developed, by way of formu- 
las and machine methods. 

The electric test-scoring machine has continued to exert a growing influ- 
ence on psychological tests. The International Business Machines Corpo- 
ration, manufacturers of the machine, has published, at intervals, lists of 
tests that have been adapted for machine scoring. Fortunately, the multiple- 
choice item, for which the machine is adapted and which it therefore 
tends to popularize, is the most versatile type of objective item for meas- 
uring information, skill, and reasoning power. The tests described by Smith 
and others (3) are only the most outstanding examples of the use of 
multiple-choice questions to test the higher mental processes. A full dis- 
cussion of the issues raised by the test-scoring machine is implicit in 
Remmers and Gage (2) where they present the relative merits of objective 
and essay type tests. 

The graphic item counter, a special attachment of the International Test- 
Scoring Machine, has greatly enhanced the value of the machine by pro- 
viding means of mechanically recording the frequency of one or more 
answers to each question and thereby laying the basis for item analysis 
of the questions. An additional use, in computing correlations, is cited in 
Chapter X of this issue. 

This overview of trends in psychological measurement during the past 
three years leads to the conclusion that tests are being more carefully con- 
ceived, constructed, evaluated, and applied. More objective and valid tech- 
nics of appraising intellectual, mechanical, esthetic, emotional, and social 
development are being devised. Within the armed services some are being 
developed very rapidly. With the prestige that accrues from military appli- 
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cations and the fundamental soundness of the new approaches in many of 
the fields covered in this issue, psychological tests may be confidently 


offered as tools of specific usefulness in research and application in the 
world we choose to build. 
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CHAPTER II 


Current Construction and Evaluation of 
Intelligence Tests 


ETHEL L. CORNELL 


A review of the literature of the last three years indicates some accelera- 
tion in certain trends which have been discernible, tho less clear-cut, before. 
These trends make it more and more difficult to draw any clear line of de- 
marcation between so-called “general intelligence” tests, comprehensive 
achievement tests, tests of “primary” factors as contrasted with “special 
aptitudes,” and other methods of evaluating or predicting performance, 
including projective technics, attitude scales, and longitudinal growth 
studies. If one attempts to limit oneself to measures called tests of general 
intelligence, one finds a rather barren field; if one thinks of intelligence 
tests as measures for evaluation of present status, or for prediction of future 
status, one finds a formidable barbed wire entanglement of the various ap- 
proaches mentioned. This chaotic situation is ascribed by Cattell (11) to a 
failure to clarify our thinking regarding the nature of intelligence. 

Two trends will be described in some detail. The first is concerned with 
efforts toward a better understanding of the process of concept formation, 
dealing with both verbal and nonverbal materials. The second is a comple- 
mentary approach—a better analysis of various patterns of intellectual 
performance. 


Trend toward Testing of Concept Formation 


The impetus toward this type of testing has come principally from ex- 
perimental work in clinical psychology on disturbances in intellectual proc- 
esses of patients with brain injuries. More recently some of these experi- 
mental methods have been adapted to the testing of normal groups. 


Sorting Tests 


A considerable number of studies have appeared in recent years which 
have utilized sorting tests to throw light on the process of concept forma- 
tion. The early work was done by Gelb and Goldstein and by Weigl in 
Germany during the "twenties, with aphasic patients, using colored yarns 
familiar in testing for color blindness, which the patient was asked to sort 
according to varying classifications. A review of this work with later con- 
firmation and extension of the findings was made available to American 
students in 1941 by a translation of an article of Weig] (54) and in a 
monograph of Goldstein and Sheerer (20). The latter describes the be- 
havior of normal ard abnormal adults with five tests involving sorting or 
classification according to a principle which the subject must discover for 
himself. The tests used were: (a) the Goldstein-Sheerer Cube Test, a modi- 
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fication of the Kohs’ Block Designs, designed to reveal impairment in “ab- 
stract behavior”; (b) the Gelb-Goldstein Color Sorting Test; (c) the Gelb- 
Goldstein-Weigl-Sheerer (GGWS) Object Sorting Test, in which the sub- 
ject sorts a variety of miscellaneous objects according to any chosen scheme 
and then is required to shift his frame of reference; (d) the Weigl-Gold- 
stein-Sheerer Color-Form Sorting Test, using the same principle with a 
series of differently colored geometrical figures; and (e) the Goldstein- 
Sheerer Stick Test, a test of a somewhat different order, requiring the 
subject to reproduce a meaningless design with sticks. Responses to these 
tests were found to show two kinds of approach: making a classification 
based entirely on the perceptual impact of the situation, which was called 
the concrete attitude; and behavior characterized by a conceptual or ab- 
stract, volitional attempt to discover categories of classification. Either 
type of approach Goldstein and Sheerer regarded as a functional level of 
integration of the whole personality. The individual “as a whole gears him- 
self toward a specific direction of activity which we call abstract or concrete 
behavior” (20:2). Abstract behavior is conceptual, but not identical with 
verbal. 

A similar approach was made by Vigotsky, whose Concept Formation 
Test has been used to reveal the disintegration of thinking in schizophrenic 
and other clinical types by Hanfmann and Kasanin (25) and by Reichard 
and Rapaport (43). An interesting report on the patterns of behavior 
shown on this test by graduate students and hospital attendants was made 
by Hanfmann (24). The test as described by Hanfmann and Kasanin is 
easily administered and applicable over a wide range of ability. It consists 
of 22 blocks of 6 shapes, 5 colors, and 2 thicknesses, which the subject is 
told to sort into a fourfold classification. On the bottom of each block is 
printed one of four nonsense syllables which define the four categories. 
When the subject makes a wrong classification, he is shown the wrong non- 
sense syllable, and thus comes gradually to a concept of the classification 
represented. Hanfmann described the two types of approach to this prob- 
lem as perceptual and conceptual, but found that, contrary to what might 
be anticipated, in terms of efficiency of performance, the perceptual ap- 
proach required reliably less time and help for solution. The nature of the 
task, however, may have been the important factor. Even more important 
than the approach was the factor of whether the perceptual elements re- 
inforced or obstructed the conceptual “conscious intention.” When per- 
ceptual elements reinforced the conceptual approach, that is, when the 
pattern of performance was concordant, the solution was far more efficient. 
When the pattern was discordant, efficiency of performance was reduced, 
and in extreme cases of discordance, the conflict led to breakdown and an 
emotional reaction to frustration. There was no correlation, however, 
between perceptual-conceptual approach and concordant-discordant pat- 
tern of performance. An attempt with a small number of subjects to meas- 
ure the consistency of the test response by comparing it with “whole- 
response” and “normal details” responses to the Rorschach suggested that 
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there was a tendency toward maintaining the same pattern. A reliable sex 
difference was found, men being pronouncedly more often conceptual, 
women perceptual in their approach. (A similar sex difference among 
children was found by Rose and Stravrianos (44) using verbal materials. ) 
It was also found that the subjects who had a Ph.D. or M.D. degree favored 
the conceptual approach, as compared with those having no more than an 
M.A., tho no difference was found as between concordance and discordance. 
The reviewer would agree that “an attempt to consider intellectual per- 
formances in relation to personality structure is a fruitful one” (24:325) 


and should be further pursued. 
Tests of Educing Relations 


A different type of test, but one also concerned with concepts, or what its 
author calls “eductive ability,” was developed and quantitatively standard- 
ized by Raven (41). This is a series of sixty matrix tests, each test a design 
from which a part has been removed. There are five sets of problems, the 
initial problem in each set being easy and the sequence of problems in each 
set providing standard training in the process of “educing” the relations 
necessary to solve the problems. In its original clinical form, it was given 
as a performance test, the subject selecting the appropriate piece from 
several possibilities. It has been standardized as a paper and pencil test, 
however, and recently revised for use as a classification test for adults (42). 
It was standardized on a group of 1407 children 6 to 14 years of age, and 
3665 men between the ages of 20 and 30, selected so as to give a random 
sample of the entire population of those ages in an English manufacturing 
town. The median age-scores showed slow development up to age 8, fairly 
rapid development from 8 to 1314, at which age development reached ap- 
proximately the level attained by the group aged 20 to 30 years. At per- 
centile levels above the median, the ability showed earlier beginning of 
maturity, a more rapid rate of development, and continuation over a longer 
period. Below the median, Raven found that development began later and 
developed more slowly over a shorter period of years, reaching its limit 
before the age of 14. The 95th percentile of adults represented the same 
point as the median of the university men in his sample. Raven concluded 
that “intellectual superiority” (the best 5 percent) might be defined as 
“that degree of eductive ability” required for pursuing university work 
satisfactorily. 

Raven considered that the rate of development was not constant and that 
IQ’s, therefore, should not be used as if comparable to Binet IQ’s. His 
growth curves, however, are similar to those obtained in this country in 
longitudinal growth studies, using Binet tests. 

A somewhat similar development has been taking place with the increas- 
ing recognition of conceptual, high-level abilities that are relatively in- 
dependent of verbal abstraction. Certain “spatial perception” tests are 
probably more dependent on ability to see abstract spatial or mechanical 
relationships than to spatial perception per se, as Crawford (12) indicated. 
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Verbal Tests Emphasizing Concepts 


The tests so far mentioned are especially interesting because they are 
nonverbal approaches to the problem of testing concept formation. A few 
verbal approaches to this problem have also been made. Cronbach (13) 
constructed tests of the vocabulary of social science and algebra courses in 
a “multiple true-false” form and found a wide range in the precision with 
which basic terms are used even by teachers. Wilking (55) examined the 
vocabulary of the Iowa reading scale with reference to the relation of its 
vocabulary to the frequency of concept categories in our language and con- 
cluded that more attention should be given to these in test construction. 


Tests of Critical Thinking 


A significant development in testing aspects of thinking concerned with 
the interpretation of data, the application of principles of scientific and 
logical reasoning, and appreciation of the nature of proof was made in the 
testing done for the Thirty Schools Experiment and described in the third 
volume of Adventure in American Education (26). These tests were rather 
complicated to score and required expert evaluation. A simpler instrument 
for general use is the Watson-Glaser Tests of Critical Thinking (52). These 
tests were used by Glaser (19) in an experiment to determine whether any 
aspects of critical thinking were susceptible to direct training. The tests 
were devised to measure several aspects: (a) a survey of opinions to show 
the extent to which a person tends to be consistent when statements regard-_ 
ing current issues are expressed in opposite ways; (b) a test of logical -ea- 
soning with syllogistic exercises and the elaboration of evidence; (c) ~ .est 
of ability to judge inferences drawn from statements of fact; (d) a test of 
generalization; (e) a test of discrimination in choosing arguments; (f) a 
test of the evaluation of arguments. Test-retest reliabilities of the various 
parts were in the .80’s and validity of items was sought by requiring the 
unanimous agreement of fifteen expert judges. A controlled experiment 
with instructional materials developed by Glaser showed highly significant 
gains for the experimental as compared with the control groups in twelfth 
year English classes after ten weeks of instruction, altho no significant 
gains were made in the Otis Quick Scoring Test of Mental Ability. Glaser 
concluded that the aspect of critical thinking most susceptible to improve- 
ment was the attitude of being disposed to consider an issue thoughtfully. 
He pointed out that the development of skill in applying a method of log- 
ical inquiry appeared to be limited by the extent of information possessed 
about the specific problem toward which the thinking was directed. There 
was a tendency for pupils of higher IQ to gain more from training than 
pupils of lower IQ, altho the lowest tenth in IQ actually gained more than 
the highest tenth. The extent to which these tests are related to the compre- 
hension of language is shown by a correlation of r = .77 between the initial 
Watson-Glaser tests and the Martin Reading Comprehension Test (a test 
still in experimental form). The correlation with the Otis Quick Scoring 
Test was much lower, r = .46. Glaser asks: “Can ‘ability to behave intelli- 
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gently’ be improved by training in critical thinking, and can ability to see 
the relations required by intelligence tests be improved by a longer period 
of training than was here attempted?” (19:180). Piersel (37), in similar 
vein, predicts that future intelligence tests may be expected to expose 
whether the processes of acquiring new information and of synthesizing 
it with the old are alike or different. Further research may throw more 
light on the relations of verbal and nonverbal test approaches to the devel- 
opment of concepts and abstract thinking. 


Trend toward Analysis of Patterns of Performance 
in General Intelligence Tests 


The significance of patterns of performance as aids to diagnosis, prog- 
nosis, and counseling has been given new emphasis in recent work. The 
difference between the newer pattern approach and the old profile descrip- 
tion was reviewed by Bijou (6), who pointed out the inadequacies of pres- 
ent tests as well as the need for better analysis of the essential aspects of 
behavior to be measured. He and Esher (15) showed that patterns were 
different among different types of maladjusted persons. Rantman (40) also 
noted that patterns of difficulty in the revised Stanford-Binet were different 
for mental defectives and normals, and Laycock and Clark (30) reported 
on the differences between young-bright and old-dull of equivalent mental 
and socio-economic status. Kuhlmann (29) called attention to some of the 
still unsolved problems of factor analysis, pointing out that even when 
factors were identified, their importance still needed validating, and cau- 
tioning against the danger of substituting statistical “analysis of functions” 
for empirical trial against a criterion in test construction. 

Olson and Hughes (36), in a study of the relation of separate aspects of 
growth to the whole, developed a measure of “organismic age” which had 
greater stability than mental age alone because of the fluctuations in differ- 
ent aspects of growth, which suggested a compensating factor. Morrow 
(34) made a study of scholastic aptitude and several vocational aptitude 
tests of college students, reaching the conclusion that an organismic hypo- 
thesis was necessary to explain the relationships found, which were “func- 
tional and dynamic relationships within the total personality.” Speed and 
level were found by Baxter (3) to vary independently among college 
sophomores, and a better prediction of attainment was obtained when both 
were combined. MacPhail (31) found that the differential relationships 
reported in the test manual between Q and L scores on the ACE Psycholog- 
ical Examination and achievement in quantitative and linguistic subjects, 
respectively, did not hold in his institution—which suggests that situation, 
as well as personality, has some effect on test results. Patterns of different 
professional student groups on the Primary Mental Abilities Tests were 
found by Stuit and Hudson (48) to be quite different, with high points not 
necessarily in the tests having the highest predictive values for the profes- 
sional group. “This fact supports the contention that unitary measures of 
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intelligence are not sufficient alone to characterize the mental ability re- 
quirements for a professional group” (48:182). 

Primoff (39) developed a formula for determining the extent of shift 
from expected score within the range of tests of a single individual in order 
to get intercorrelations which could be factor-analyzed to throw light on 
individual ability patterns. Evidence of stability of individual patterns from 
preschool age thru high school comes from the Minnesota growth studies 
(22), which Goodenough thinks suggest “that these differential patterns 
probably depend at least in part upon inherent characteristics of the in- 
dividual organism” (22:96). 


Comprel. sive Achievement Tests as Tests of Intelligence 


One of Tyler’s basic assumptions (51) for an evaluation program is that 
the way in which an individual organizes his behavior is an important 
aspect to be appraised and that in any given situation, information is not 
separated from skills, attitudes, interests, appreciations, or ways of think- 
ing. Comprehensive subjectmatter tests at the sophomore college level were 
examined by Selover (45) for their possibilities as predictive of success in 
various professional schools. Findley (16), at the secondary-school level, 
developed a comprehensive test for scholarship awards based on subject- 
matter, which he described as a “test of general intellectual competence 
accumulated and maintained for use.” The tests developed by the Coopera- 
tive Test Bureau and by the Evaluation Staff of the Progressive Education 
Association have been in this direction for several years. The need for 
measuring outcomes of school in terms of maturity and range of develop- 
ment was also pointed out by French (18). Such tests, as Wrightstone (57) 
has pointed out, have involved the ability to organize facts, to interpret 
data, to apply principles to new situations in dealing with various fields of 
subjectmatter. These abilities are aspects of what Wrightstone and others 
have called “critical thinking,” a term which seems to have much the same 
connotation that “the educing of relations” has for the British, or that “ab- 
stract ability” has for other American psychologists. 


New Tests and New Standardizations 


The foregoing trends have been described in some detail since they are 
thought to be fruitful in clarifying the meanings of “general intelligence.” 
Space remains for only the briefest mention of other new tests and evalua- 
tions. Original data on the scaling of the ten tests used in the Chicago 
Mental Growth Battery (17) have been published in such form that any 
worker may supplement them with additional data. Standardization of 
the Junior Scholastic Aptitude Test developed by the Secondary Education 
Board for placement purposes in junior high school was described by Trax- 
ler (50). In the clinical field, a finger schema test was developed by Wer- 


ner and Carrison (53), apparently related to arithmetic achievement at a 
mental level of six to ten. 
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Growdon (23) and Spache (47) made arrangements of the revised 
Stanford-Binet as a point scale, both of which gave as satisfactory correla- 
tions with total scale as did the abbreviated scale and seemed more satis- 
factory. All workers with the revised Stanford-Binet caution against stop- 
ping with the first year of complete failure. Berger and Speevak (5) found 
an average increase of three months by extended testing, with a range up to 
twelve months. 

Martinson and Strauss (32) developed a qualitative scale to supplement 
quantitative scoring of the Stanford-Binet, which might assist inexperienced 
clinical examiners. Pintner (38) enlarged all the visual materials of the 
new Stanford-Binet for use with partially-sighted children, and altho he did 
not find significant differences in score, he thought the enlarged materials 
met with a better response in sight conservation classes. 

Thorndike (49) reported the standardization of two rapid screening 
vocabulary tests, two parallel forms of twenty words each chosen from the 
vocabulary section of the CAVD tests. The test had an estimated reliability 
of r= .83 for a cross-section adult group, which was probably adequate 
for rapid screening purposes. Two unstandardized verbal tests were devel- 
oped by Hebb (27) with the purpose of avoiding vocabulary difficulty at 
the higher levels of verbal difficulty: (a) an extension of the Van Wagonen 
analogies test, and (b) a completion test of the Minkus type. Ackerman 
(1) revised the Viennese tests for young children. 


Evaluations 


A number of studies (2, 4, 7, 8, 9, 10, 35, 46, 56) comparing old and 
revised Stanford-Binet, short and long forms of Stanford-Binet, Stanford- 
Binet with Kuhlmann-Anderson, Kuhlmann Tests of Mental Development, 
Wechsler Bellevue and Kent EGY Tests have appeared. They agree that 
tests cannot be used interchangeably and that caution must be exercised in 
interpreting IQ’s because of the dissimilar variability of different tests. 
The Stanford-Binet test has been found to have a wider variability than the 
Wechsler-Bellevue. On the basis of the regression equation calculated from 
the correlation among adult mental patients, Benton, Weider, and Blauvelt 
(4) made a table of equivalent scores indicating a revised Stanford-Binet 
IQ of 100 to be equivalent to a Wechsler-Bellevue IQ of 97, while at a 
Stanford-Binet level of 60, the equivalent Wechsler IQ was 72, and at a 
Stanford-Binet level of 140, the equivalent Wechsler IQ was 122. A com- 
parison of Stanford-Binet with Kuhlmann Tests of Mental Development 
among mentally deficient adolescents by Carlton (10) suggested that the 
upper psychometric limit of feeblemindedness should be placed at a lower 
point on the Kuhlmann than on the Stanford-Binet. 

Goodenough’s suggestion (21) for a formula for converting all IQ’s into 
an IQ-equivalent based on the standard deviation is a practical and useful 
method for equating the variability of different age groups. Kuhlmann 
(28) found that the Heinis personal constant was as applicable to Stanford- 
Binet as to his own scale and yielded more constant values on retesting than 
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the IQ. DeForest (14) compared Merrill-Palmer and Minnesota Pre-school 
Tests and found both correlated about r = .55 with later Binet. 

McNemar’s analysis of the standardization data of the 1937 Stanford- 
Binet revision has been published (33). This includes analysis of the rela- 
tion of the test construction to variability, verification of the fact of in- 
creasing reliability with decreasing size of IQ, factor analyses tending to 
substantiate the importance of a common factor for all tests at the same 
level and, possibly, at different levels. Nevertheless, group factors were 
clear enough to suggest that selection of tests could be made to provide a 
vocabulary scale, a nonverbal scale, and a scale for “immediate memory.” 

All these studies leave one with a conviction that the possibilities of pre- 
diction seem more promising than they did five or ten years ago. The most 
significant advance will probably come from the results of the many growth 
studies, which are treated in another number of the Review. 
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CHAPTER III 


Applications of Intelligence Tests 


FRANK 8S. FREEMAN 


Durrre THE LAST THREE YEARS, published research on the applications of 
intelligence tests has been very much along the usual lines. With few excep- 
tions, the results of these researches were not unusual or surprising. A large 
portion of the published studies were concerned with conditions and predic- 
tions of scholastic achievement at all levels. Applications and usefulness of 
the 1937 Stanford-Binet, in particular, were the subjects of an appreciable 
number of researches; and Thurstone’s “primary mental abilities” were 
given close scrutiny in a number of situations. Constancy of IQ (or equiva- 
lents), community differences, socio-economic status, intelligence of non- 
white groups, atypical groups, sex differences, reading disability in rela- 
tion to test performance, and several other miscellaneous problems received 
more or less attention. For the most part, students of these problems will 
find that the results of research of the three years covered herein largely 
confirm earlier findings. 


Intelligence Tests and Educational Achievement 


Elementary school—Cohler (20), studying a group of pupils in Grades 
VI-VIII, all having IQs of 120 or higher, explored differences between 
“achievers” (those working up to expectation) and “nonachievers.” He 
found the expected inverse relationship between achievement and IQ. The 
difference in favor of the “achievers” was greatest in arithmetic and least 
in reading comprehension; and on the whole, “achievers” were about one- 
half semester ahead of “nonachievers” in school achievement. For achieve- 
ment age and MA, the r was +.58. 

Feinberg (24), studying IQ and EQ of children referred to a mental 
hygiene clinic, reported correlations between the two quotients varying 
from —.37 to +.76. The former was found for those having IQs between 
4A and 69; the latter, between 120 and 192. The correlations were negative 
for the IQ-groups below 90, but positive for all groups above 90 IQ. The 
author concluded that EQ cannot be substituted for IQ, as some have sug- 
gested, in dealing with children who present clinical or school problems. 
Thorndike, Woodyard, and Weingart (76) argued, however, that if school 
progress correlates .80 or better with intelligence tests, such progress may 
be treated as having the same value as intelligence tests, since coefficients 
of correlation for two different tests are rarely much over .80. The authors 
correlated IQ and age of children in Grade VI of seventeen cities. The me- 
dian r in northern cities was .79; in southern cities it was .85. 

Benson (8) reported on the scholastic survival of 1680 children of dif- 
ferent intelligence levels, tested in the sixth grade. The r for IQ and grade 
level attained was -+-.57. Billhartz and Hutson (10) found the r between 
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Stanford Achievement Test results, at the junior high-school level, and 
“college quality-point averages” was -+-.36, while IQs obtained in junior 
high school correlated +-.37 with “college quality-point averages.” Neither 
of these criteria was as good as the junior high-school marks whose r with 
“college quality-point averages” was +-.56. 

High school—At the high-school level, intelligence tests have been used 
for a variety of purposes, all of them, however, being aspects of educational 
and vocational guidance. Hartson and Sprow (30) found that the Ohio 
State Psychological Test was more satisfactory in predicting college marks 
than other tests taken in high school; for the Ohio correlated +.50 with 
one semester’s work, and +-.44 with seven semesters’ work; whereas the 
others showed rs of +.34 and +.31, respectively. More important than 
these correlations is the fact that while 80 percent of those above 126 IQ 
became seniors, 64 percent of those below 116 IQ also did so. Obviously, 
there can be little sound guidance on the basis of these and similar data 
alone. Hartson (29), following up the foregoing study, reported a correla- 
tion coefficient of +.20 for high-school marks and mean college marks, 
whereas a coefficient of +-.70 was found for the mean of college scholarship 
and the mean of intelligence test performance, a rather striking and unusual 
difference. 

Livesay (42) analyzed the American Council Psychological Examination 
scores of 2255 high-school seniors in Hawaii. The mean score of those ex- 
pecting to enter college was 150 (S.D. 52), while the mean of those not 
expecting to enter was 104 (S.D. 41), an appreciable and significant differ- 
ence. The 521 who actually entered college had a median score of 169 
(S.D. 46). Livesay (43), analyzed the test scores of the same group on the 
basis of vocational choice. As usual, those expecting to enter the professions 
(other than teaching) had the highest mean score, 157 (S.D. 55) ; teaching 
was second with a mean of 146 (S. D. 51); then business, 133 (S. D. 55); 
semiprofessions, 125 (S.D. 52); agriculture, 124 (S.D. 49); clerical, 118 
(S.D. 42); skilled trades, 104 (S.D. 45). The considerable overlapping of 
groups is of outstanding significance, even tho the differences between the 
means were statistically significant. Of some importance in guidance are 
the findings of Livesay (41) in respect to subject preference. Students pre- 
ferring mathematics had a mean score of 150 (S.D. 53); language, 141 
(S.D. 54); science, 135 (S.D. 49); commercial, 111 (S.D. 40); “expres- 
sion” subjects, 98 (S.D. 43). Again, we note these data indicate “trends”; 
but the wide overlapping of groups once more emphasizes the importance 
of the individual as the educational unit. 

Layton (39) found that eighth-grade mathematics marks predicted high- 
school algebra marks better than did the Otis S-A test, the correlation co- 
efficients being .82 for the former and .55 for the latter. When Otis test re- 
sults and eighth-grade mathematics marks were combined to predict alge- 
bra marks, the multiple correlation coefficient was found to be .84, hardly 


enough increase to warrant the added labor and time in testing and calcu- 
lating. 
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Mitrano (58), using the Revised Alpha Test, found that of the pupils 
applying for admission to a technical-industrial high school, the mean age- 
group scores were: 100 at age thirteen; 91 at age fourteen; 82 at age fif- 
teen; 78 at age sixteen. The age-groups fell in the same order in their per- 
formances on the Revised Minnesota Paper Formboard Test and the Mc- 
Quarrie Test of Mechanical Ability. Allison and Barnett (3), using the 
1938 ACE test, with college freshmen, found statistically reliable differ- 
ences between the mean Q and I scores in high schools with different size 
enrolments. The differences, tho statistically significant, were of doubtful 
educational significance, as indicated by the standard deviations of the 
high schools grouped according to enrolment (0-149, 150-499, 500 and 
up) and by the weak “trend” revealed by the r of .37 between gross scores 
on the test and high-school enrolment. 

College and university—Ellison and Edgerton (23) gave the Thurstone 
tests to forty-nine students. Coefficients of correlation between test scores 
and point-hour ratio ranged from .44 (for “verbal factor”) to —.21 (for 
“spatial visualization”). The coefficient for point-hour ratios and total 
scores, weighted for the seven factors, was .64. Factor V (verbal), how- 
ever, is the only one of the seven that has a consistently high r with marks 
in the subjects of English, science, foreign languages, and psychology. 
Smith (65) found the r for high-school grade-point average and one year’s 
junior-college marks was .70; the r for Thurstone test results and the same 
marks was only .44. The multiple correlation coefficient (R) for college 
grade-point averages with high-school averages and Thurstone scores was 
.73. The Thurstone test scores added little to the predictive value of high- 
school averages in this study. Yum (82) found, among others, the follow- 
ing correlations of the “factors” of the Thurstone test with scholarship in 
divisional studies at the University of Chicago: verbal factor, .35; induc- 
tive reasoning, .29; deductive reasoning, .26. These were the higher corre- 
lations. The social science group had the highest correlation between test 
scores and scholarship, .58. The “number factor” correlated .37 with 
women’s scholarship and .05 with men’s. Using the verbal factor, inductive 
and deductive reasoning, R was only .41, but was just as good as the R 
found when all seven “factors” were correlated with scholarship, .42. 

Mitchell (57) found the revised Stanford-Binet correlated .64 with the 
marks of a sampling of liberal arts freshmen. The four-year course aver- 
ages of the entire senior class in medicine correlated only .15 with the 
Stanford-Binet ratings, due probably to the relative homogeneity of these 
students and to the specialized character of their studies. Heston (31) gave 
the Ohio State Psychological Examination and seventeen performance tests 
to 113 male college freshmen. The r for the Ohio test and grade-points was 
.54. The rs for the performance tests and grade-points were very low, negli- 
gible when positive; and some were negative. When the results of the seven 
best performance tests were combined with the Ohio test results, R was 
found to be .65. 
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Harrison (28) found that prospective teachers at Parks College, over a 
period of ten years, were slightly but not significantly superior to non- 
teacher students on the ACE test. Seagoe (63) found prospective elemen- 
tary-school teachers to be superior on the ACE test. The mean score of 125 
such men and women gave them a percentile rating of 81. The middle 68 
percent of the group achieved percentile ranks from 56 to 96. Schneidler 
and Berdie (62), having tested students in the several colleges and profes- 
sional schools within a university, concluded that on the whole there is a 
“tendency” for college aptitude of the “average student” in colleges of edu- 
cation, agriculture, dentistry, and pharmacy to be lower than in science, 
technology, business, literature, and arts. There was the usual considerable 
overlapping; and the differences became much smaller as students were 
compared from freshman to senior year. It was pointed out that in some 
of the higher ranking student-groups, a selective process had already been 
at work, for their “freshmen” had satisfied a prerequisite of two years of 
study in a liberal arts college. 

Bernreuter and Goodman (9) applied the Thurstone test to engineering 
freshmen. Intercorrelations among “primary abilities” ranged from +-.03 
to +-.45, with a median of +.24. The rs for the separate tests of “primary 
abilities” with semester averages ranged from +-.04 to +-.38. R for the five 
highest “primary abilities” (number, verbal, space perception, induction, 
deduction) with semester averages was .51. The rs for separate tests of 
“primary abilities” with marks in individual courses of study were quite 
low for the greatest part, ranging from 0 to .44. These results are not en- 
couraging for the guidance of prospective engineers by means of these 
tests of “primary abilities.” Stuit and Hudson (73) also used the tests of 
“primary mental abilities,” not only with students of engineering but also 
with students of journalism and medicine. Coefficients of correlation be- 
tween grade-point averages and the various “primary abilities” varied in 
the case of engineering students from .15 to .57; for medical students, from 
—.21 to +.35; for journalism students, from .01 to .50, indicating that 
the separate tests of “primary abilities” have only a minor predictive value 
so far as these three groups of professional students are concerned. 

The ACE and the Minnesota Paper Formboard tests were used by Bryan 
(17) to find the extent to which measured intelligence contributes to suc- 
cess at the School of Fine and Applied Arts, Pratt Institute. For the entire 
group of approximately 1000 students tested, the mean percentile rating 
was 49; for men, it was 44; for women, 54. Taking the group as a whole, 
r for ACE results and achievement in art courses was a negligible .16. For 
students in architecture, r for ACE test results and average marks in art 
was .37; for these same students, average marks in design correlated .22 
with ACE results. The corresponding coefficients of correlation for art 
education students were, respectively, .05 and .02; while for students in 
design they were, respectively, .12 and .09. The Minnesota test results were 
no more valuable, the rs ranging from .17 to .33. The results of this study 
indicate that success in these courses in fine and applied arts depends 
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largely upon abilities other than those measured by either of the two tests 
employed. 

Bruce (15) found that the CAVD scale has some value as an instrument 
in discriminating between students at the graduate level. Using levels M, 
N, O, P, Q of the test, Bruce found candidates for the doctorate, as a group, 
ranked higher than candidates for the master’s degree; it was “virtually 
certain” that students above the 75th percentile will make higher marks 
than those below the 25th; the r for marks and test performance was .52. 

Stalnaker (69) found that 49 percent of an entering class discontinued 
their college careers before the end of the fourth year. In each semester a 
larger proportion of withdrawals came from the lower half than from the 
upper half in intelligence scores. Altender (4), using the Henmon-Nelson, 
the revised Minnesota Paper Formboard, and two interest and adjustment 
inventories, came to the conclusion that, so far as guidance of college stu- 
dents is concerned, psychological tests are useful from a qualitative rather 
than a quantitative point of view. A.her.and Gray (5) studied personality 
traits as they were related to intelligence and college success. They devised 
their own personal history inventory and assigned point values to prac- 
tically all items. These “personal istory scores” correlated .06 with in- 
telligence test scores. The correlation between “personal history scores” 
and the point-hour ratio was .30. Intelligence test scores correlated .51 with 
the point-hour ratio. When the element of length of academic survival was 
worked into the criterion of college success, the coefficients were .36 for the 
general ability test and college success, and .39 for “personal history 
scores” and the criterion. 


Constancy of Intelligence Ratings 


McHugh (51), using Forms L and M of the Stanford-Binet, tested and 
retested ninety-one kindergarten children, the interval being 1.9 months. 
The mean attendance of the children was thirty half-day sessions. Initial 
mean IQ was 100, with S.D. of 12; the retest mean IQ was 106, with S.D. 
of 11. The r for the IQs of the two tests was .75. Children who attended 
kindergarten between five and nineteen three-hour sessions gained an aver- 
age of 3.7 points; those who attended 40-49 sessions gained an average of 
6.6 points. For all the children, the r for IQ gains and number of sessions 
attended was .15. Analysis of the individual items of the tests showed 11.2 
percent improvement on the “speech items,” but only 4.7 percent improve- 
ment on the “manual items.” One would expect about equal improvement 
if the principal factor were only better adjustment to the test situation. 
Allan and Young (2) retested one hundred children with the 1937 Stan- 
ford-Binet, after intervals of one to twelve years. Initially, the tests used 
were the 1916 or 1937 Stanford-Binet or the Merrill-Palmer. Mean age at 
the initial test was four years; at retest it was nine years and five months. 
Thirty others were given one of these individual tests initially, but were re- 
tested with the American Council Psychological Examination. The group 
of one hundred showed a mean gain of 8 points in IQ, the increase being 
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greater in the longer intervals. Those having initial IQs of 115 or higher 
showed a mean gain of 3 points over an average of five years, seven 
months; whereas the group between 93-114 IQ made a mean gain of 14 
points in five years, four months. In the whole group, two and one-half 
times as many individuals gained as lost IQ points. The reported changes 
are larger than those found in most earlier studies. 

Lowell (45) reported results of retesting with Binet tests 3000 children 
referred to the psychological clinic of the Cleveland school system. Of 
these, 1000 were tested twice, 1000 three times, and 1000 four times. Three 
percent did not vary at all; 26 percent changed 1-6 points; 71 percent 
changed 7 or more points. Most of these changes were losses. The mode 
for the IQs of the first 3000 tests was in the 90-94 interval; for the second 
3000 tests it was in the 75-79 interval; for the 2000 third tests it was in 
the 70-74 interval. A group of 394 cases first tested at age 5, then retested 
three times at intervals of a year or more, had median IQs of 81, 77, 71, 
66 at the four testings. These surprising results are probably attributable 
to the fact that the children tested were visually handicapped, deaf, crippled, 
behavior problems, mentally defective, or mentally retarded. Unless these 
and other handicaps operating selectively and cumulatively account for 
the marked decline, it would be impossible to explain these data; for the 
Binet tests are not so unreliable, nor is it likely that a group of even 3000 
atypical children have deteriorated so markedly. Street (71), on the other 
hand, found that of 920 children, exceptional in respect to physical, emo- 
tional, or intellectual traits, only 43 changed 10 points or more in Binet IQ. 

Barnes (6), using the ACE test, found appreciable gains during the 
first two years of college. The gains in L scores were more marked than in 
Q scores. On the initial test the mean of the subjects tested was 9 percentile 
points above the mean of the national scores; on the retest, the mean of 
the subjects tested was 34 percentiles above the national mean. The r for 
initial test and retest was .78. Hunter (34), also using the ACE tests, 
retested 276 women during their four-year college course. The retest at 
the end of the freshman year showed a mean gain of 23 in percentile rank 
over the initial results; at the end of the sophomore year the mean gain 
in percentile rank over initial test scores was 24; at the end of the junicr 
year, it was 26; and at the end of the senior year it was 31. Correlations 
between initial tests and retests were, respectively, .81, .85, .84, .83. Stu- 
dents with lowest initial scores made most improvement; of the 276 stu- 
dents, only 14 had losses. McRae (52), retesting Scottish school children 
with more than a half-dozen group tests and with the 1937 Stanford-Binet, 
concluded that the effect of practice was to produce an average improve- 
ment of about 2 points in IQ, that individuals are not consistent, that 
successive tests do not produce equal gains, and that the first two tests act 
as “shock absorbers.” 

Wallin (77) reported the retesting of two sisters with the 1916 Stanford- 
Binet. One sister got 27 semiannual tests, while the other got 26. Wallin 
analyzed in detail the IQ variations on the basis of external and internal 
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factors, including the use of 16 and 14 years as the maximum denominator 
of the IQ formula, and the adequacy of the tests at the extremes. He also 
presented detailed developmental data and health records of each subject. 
Both sisters had a range of 27 points in IQ; but the average variation was 
small. In the case of one sister, P.E., using 16 as the maximum denominator, 
was 3.5 points, while it was 4 points when 14 was used instead of 16. 
For the other sister the P.E.’s were 3 and 4 points, respectively. It is 
obvious that the range of fluctuations is much more significant than their 
average. Wallin concluded, and not surprisingly, that fluctuations are due 
to a variety of causes, and that broad diagnostic and prognostic generali- 
zations should not be ventured in clinical work without careful considera- 
tion of all internal and external factors that affect responses to psycho- 
logical tests. 

Davis (21) reported the effects upon Binet IQ ratings when, instead 
of using the regular method of scoring, the examinee is credited with. all 
items up to the level of his highest achievement, regardless of whether or 
not he actually passed the items lower in the scale. Using this method, 
Davis found with 367 clinical cases that the mean increase in IQs was 19 
points, but that relative positions rarely changed. Greatest gains were 
made by those having reading or personality difficulties. In no instance 
did the new technics indicate a change in original recommendations for 
those examined at the clinic. Quite aside from the fact that the use of 
the new technic would give a different meaning to IQs derived thereby— 
as compared with those found by the usual method—it ignores the impor- 
tant fact that in clinical work, especially, an analysis of performance on 
different types of items is essential. 

The student of IQ variability within the individual should consult in 
detail the volume by Dearborn and Rothney (22), who analyzed a large 
number of factors which were found operative in a longitudinal study 
covering twelve years. The largest gains occurred on the first repetition 
(annual), but noticeable gains were noted for as many as four trials. 
It was also found that the general tendency was for smaller gains to occur 
at the lower part of the distribution and larger gains at the upper. 


Exceptional Groups 


The thesis of the study by Wile and Davis (79) was that the IQ, tho 
significant, cannot be interpreted intelligently in clinical cases without 
reference to the basal age and the specific tests passed or failed. Results 
of 100 clinical cases are reported, tested with the 1937 Stanford-Binet; 
CA range, 5-12 years; MA range, 3-12 years; IQ range, 80-130. Of the 
subjects, thirty-two had a double basal-age on the test. The school diffi- 
culties of these children indicated that weakness in visual memory and 
visual-auditory association underlies most of their difficulties in making 
school and nonschool adjustment. Fifty-nine percent had mixed eye-hand 
functions; 12 percent were right-eyed but left-handed; 46 percent were 
left-eyed but were using the right hand; 33 percent were converted sinis- 
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trals. Pignatelli (60), reporting on 606 children, found that on the 
Stanford-Binet the patterns of mental functioning do not differentiate be- 
tween (a) problem children as compared with nonproblem children; 
(b) extremely grave problem children as compared with children whose 
behavior problems are of lesser consequence. 

Thompson (74) reported results obtained with the Otis (for Grades 
IV to VIII) given to 287 male prisoners, and the Illinois Examination I 
(for Grades III to V) given to 280 prisoners. Of the total, 286 had IQs 
below 70; and 65 were below IQ 50. There were no conclusive differences 
in average 1Q on the basis of the type of crime committed. 

Pintner (61) reported IQs of 602 partially-sighted children mostly ten, 
eleven, and twelve years old. The 1937 Stanford-Binet was used, but the 
visual materials of the test were enlarged. The mean IQ was found to 
be 95; and there were no differences between groups having 20/70 vision 
and those from 20/70 to 20/200. The IQs reported were, on the whole, 
higher than those found by others testing partially-sighted children with 
other tests. Pintner believed that these children are handicapped in taking 
the usual standard group tests. He concluded that the mean IQ of partially- 
sighted children is probably 96 or 97. 

The reader should consult the volume by Abel and Kinder (1) for a 
comprehensive study of subnormal adolescent girls in their various aspects 
and with respect to the origins and control of this group. For a study of 
the characteristics and promise of individuals at the very superior extreme, 
children above 180 IQ, the reader is referred to the volume by Holling- 
worth (32). 


Environmental Factors 


Community differences—Shepard (64) compared urban and rural chil- 
dren with respect to nonverbal and verbal abilities. He found that rural 
children were superior on the Minnesota tests of spatial relations and 
mechanical assembly, and were probably superior also on the Kwalwasser- 
Dykema Music tests. Urban children were superior on the Otis test of 
mental ability and on tests involving speed of performance. 

Worbois (81) compared changes in IQ as between children in a rural 
consolidated school and those in a rural one-room school, over a period 
of one or two years of schooling (within the first three grades). In the 
three groups studied, pupils in the consolidated schools gained over the 
others. One group, after the first year in school, showed an advantage of 
5 points in IQ, tho the initial difference was only 1 point. After the first 
two years of school, another group had an advantage of 13 points, tho 
there was no initial difference; while the third consolidated-school group, 
after Grades II and III, were superior by 13 points, tho the initial difference 
was 7 points. 

Thorndike and Woodyard (75) analyzed National Intelligence Test : 
results obtained with sixth-grade children from thirty cities. They found 
in these cities, of 20,000 to 30,000 residents, the usual wide range of 
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scores within cities, the overlapping of scores of all cities, and differences 
between median scores. They then studied the test scores in relation to 
each city’s “goodness of life for good people in the city” (sic); “index 
of personal qualities of the city’s residents”; and “index of per capita 
income of the city’s residents.” As might be expected, the coefficients of 
correlation between test scores and these three socio-economic criteria were 
high, being .86, .82, and .78, respectively. They conclude that the average 
IQ of a community is the greatest single factor in its welfare; but they 
failed to consider that the socio-economic welfare of people might have a 
bearing on the quality of a community’s average IQ. 

Brown and Cotton (14) made intra-group comparisons of Italian and 
Polish children living in deteriorated and nondeteriorated areas of Chicago. 
The results, taken as a whole, were equivocal. Italian children (age range, 
8-16 years) in the poorer areas had a mean test score somewhat higher 
than Italian children from the better areas, whereas the situation was re- 
versed for Polish children. In both national groups, the means were some- 
what below the norms of the test. Wheeler (78) repeated in 1940 an earlier 
study of Tennessee Mountain children, using 3252 subjects in 40 schools. 
In the ten-year interval there had been definite improvement in the social, 
educational, and economic status of the communities. The 1940 pupils 
were chronologically younger (average, eight months) and mentally higher 
(average, nine months) for their grades than children of ten years ago. 
The average child of the 1940 group was 10 IQ points higher than the 
average of 1930. At the same time, however, Wheeler found what has been 
found so often before in disadvantaged areas, namely, a decrease in IQ 
averages with increasing age. For example, mean at age 6 was 94; at 
11, it was 80; at 16, it was 73. 

Socio-economic level—Stroud (72) found the not unexpected correlation 
coefficients of .40 to .50 between school achievement and socio-economic 
status, and between tested intelligence and socio-economic status. More 
important, however, was the fact that pupils from disadvantaged homes 
did not get on so well or learn as well at school as did pupils of the 
same intelligence but from homes higher in the socio-economic scale. 

McGehee and Lewis (50) furnished additional confirmatory evidence on 
the subject of the ratio of superior and retarded children to “normal 
expectancy” in the several occupational and socio-economic levels. They 
found that the ratios for superior children were professional, 2.4; semi- 
professional and business, 1.62; skilled, .88; semiskilled, .92; unskilled, 
.30. Ratios for retarded children were, respectively, .14, .49, .98, 1.39, 1.53. 
Thus, in this sampling of children in Grades IV-VIII, from thirty-six states, 
all levels of intelligence were found at all levels of parental occupation, tho 
there was the usual positive association between higher intelligence and 
higher socio-economic level. It is most significant, however, to observe 
that this study, like others, showed that the great bulk of superior chil- 
dren come from “average” homes. The authors might well have pointed 
out that children from “average” homes are now as a group, and have 
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been for some years, in a position to develop and foster their abilities 
thru improved public education and thru more intelligent parental care, 
psychological and physical. Fleming (25), in Glasgow, Scotland, dealing 
with children 8-21 years of age, found a correlation of .30 between socio- 
economic level and test performance. 

Maddy (48) found the expected differences between median IQs of 
children of semiskilled workers and of professional men, namely, 97 and 
111, respectively. Standard deviations were 11.95 and 11.05. Then it was 
found that the average IQ of 31 children of professional men living in 
“nonprofessional neighborhoods” was 3 points lower than for children of 
professional men living in wealthier neighborhoods. Conversely, the aver- 
age IQ of 21 children of semiskilled workers living in superior neighbor- 
hoods was about 4 points higher than for children of the semiskilled living 
in poorer neighborhoods. These differences are suggestive; but in this 
study the numbers compared were too small to justify generalizations. 
A study by Livesay (44) in Hawaii showed the expected hierarchy of chil- 
dren’s average intelligence test ranks when classified according to fathers’ 
occupations. 

Foster homes—Layman (38) reported IQ changes in older children 
(mean age at placement, 135 months; range, 6 to 16 years) placed in foster 
homes (average residence, 26 months; range, 7 to 166 months). All sub- 
jects, 120 in number, came from homes of low socio-economic status. The 
average percentages of IQ change on the 1916 Stanford-Binet were tests 
1 and 2, 8 percent; tests 1 and 3, 9 percent, tests 2 and 3, 5 percent. The 
important conclusion was that large IQ differences were found usually in 
cases where underlying emotional adjustments have changed in either a 
favorable or unfavorable direction. Speer’s study (68) indicated the im- 
portance or early foster-home placement of children coming from undesir- 
able environments, for there was a negative relationship between a child’s 
1Q (Binet) and the time he spent in his original disadvantaged environ- 
ment; children placed before three years of age had a median IQ of 100; 
the median of those placed between three and five was 87; for those be- 
tween -*x and eight, it was 82; between nine and eleven, it was 82; between 
twelve and fifteen, the median was 67. It is also noteworthy that children 
who were defective on the first mental test remained so, the average gain 
being only .5 of a point in IQ for those below 70 IQ originally; dull chil- 
dren (IQ 70-89) made an average gain of 6 points; the average group 
gained consistently, but an average of only 3 points. 

Stoddard (70) has assembled, critically interpreted, and shown the 
social-educational implications of the Iowa and other studies of nature- 
nurture and intellectual development. Goodenough and Maurer (27) criti- 
cized the Iowa studies of intellectual development under nursery-school 
environment on grounds of “fallacious statistical practices,” namely, fail- 


ure to explain results on the basis of “statistical regression due to errors 
of measurement.” 
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Negro Groups 


Canady (19), using the ACE Psychological Examination with 497 
Negro subjects, found that the mean score of children of professional 
fathers was 96; of artisans’ children, 91; of unskilled laborers’ children, 
79. His results also confirmed the now widely held view that Negroes differ 
among themselves more than they do from white groups. Significant also 
are the mean scores obtained in the several types of communities, as fol- 
lows: large cities in the North, 113; large cities in the South, 111; small 
cities and towns in the North, 93; small cities and towns in the South, 79; 
border states, 86. The author concludes that Negro-white differences in 
intelligence test performance are due “in all probability” to the Negro’s 
inferior position in the American social system. 

The study by Bruce (16) dealt with 521 white (age range, 6 to 12.75 
years; mean, 9.5 years) and 432 Negro children (same age range; mean, 
9.75 years) of extremely low socio-economic status in the rural South. The 
impoverishment of the area is revealed by the fact that altho 50 percent of 
the tax budget went for educational work, the annual amount spent was 
only $14.77 per child. On the Kuhlmann-Anderson test the median IQs 
were white, 89; Negro, 72. On the 1916 Binet: white, 90; Negro, 74. On 
the Arthur Performance Scale: white, 93; Negro, 75. The rs between socio- 
economic ratings and JQs (Sims scale) ranged from .44 to .57. Bruce also 
found the important fact of a decline in IQ with increasing age. When the 
individual items of the tests were analyzed, it was found that some—both 
verbal and nonverbal—were unfair to children of this handicapping en- 
vironment and that some reveal the influence of the local culture and cul- 
tural values. 

Witty and Theman (80), in a follow-up study of 84 gifted Negroes whose 
IQs ranged from 120 to 200 in 1934, reported that their educational attain- 
ment was above average, but lower than for other gifted pupils. Yet, their 
level of attainment was gratifying in view of the meager opportunities open 
to some of them. Their median percentile rank on the Myers-Ruch High 
School Progress Test was 58.9; standard deviation, 24.2. Underageness for 
grade persisted; and where data were available, all but one fell in the 
highest fifth of the high-school graduating class. Jenkins (35) reported 
case studies of 16 Negro children (age range, 5 years and 2 months, to 10 
years and 8 months) having IQs between 160 and 200. As a result of the 
study of these children, their family backgrounds, and their general en- 
vironment, Jenkins concluded that individual differences rather than racial 
differences are significant, and that these cases “bring into sharp focus the 
limitations which our society places on the development of the highly 
gifted Negro.” 


Miscellaneous 


Adult group—Mitchell (56) reported results for 155 patients at a psy- 
chopathic hospital and for 67 liberal arts freshmen tested with both the 
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1916 and 1937 revisions of the Binet. The IQ means were 91 and 105, re- 
spectively. The differences increased with intellectual level as judged by 
educational history. The author’s conclusion is that the 1937 Stanford-Binet 
seems to be a more useful instrument than the 1916 revision in discriminat- 
ing between adults of average and superior intelligence. Mitchell (55) also 
reported results obtained with 268 adolescent and adult patients of a mental 
hospital, using the Wechsler-Bellevue and the 1937 Binet, Form L. Mean 
IQs were 89 and 87, respectively; S.D. 20 and 27, respectively; ranges, 39- 
134 and 28-150, respectively; r between the two tests was .89. In spite of 
the close agreement, the Binet gave higher IQs for the group of patients 
up to 30 years; the Bellevue was higher for the group above 40 years; the 
Binet was lower for the duller group and higher for the brighter group. 
Mitchell (54) found that 1937 Stanford-Binet IQs correlated .64 with the 
average marks of 67 college freshmen, but only .15 with the 4-year averages 
of 86 medical students. Equally or more significant was the fact that mul- 
tiple basal years were found for about half the subjects, with the implica- 
tion, therefore, that in testing young adults of at least average intelligence 
with Form L, it will be necessary to go as low as year XII or XIII to obtain 
the basal year. If these findings are typical, they raise a serious question 
concerning the use of the test with adults. The results obtained by Manuel 
and others (49) also raised questions concerning the proper calibration of 
the 1937 Stanford-Binet when applied to college students. For 53 cases, 
they reported an r of .67 between the 1937 Stanford-Binet and the ACE 
examination, large discrepancies in individual cases, and, on the former, 
1Qs that run too high. The new Binet revision and college performance 
correlated .45. 

Sex differences—Canady (18) reported sex differences obtained with the 
ACE Psychological examination, the subjects having been 1306 Negro 
college freshmen (637 male, 669 female). The ranges, means, medians, 
standard deviations, and coefficients of variation for total scores showed 
negligible differences. But there were reliable sex differences in subtests, 
males being superior on the numerical tests whereas females were superior 
on the language tests. Livesay (40) found the same to be true with respect 
to 2255 high-school seniors in Hawaii (1264 male, 991 female), as regards 
total scores. The male group was superior in arithmetic, while the female 
group was superior in artificial language and opposites. 

Reading difficulties and bilingualism—Kavruck (37) matched two 
groups of delinquent boys (50 in each) with respect to CA and IQ. Age 
range was 13-16 years; IQ range was 66-112. One group attained the read- 
ing standard expected of their mental ages; the other group were at least 
two years below the expected reading standard. Analysis of the perform- 
ance of this latter group on the 1937 Binet showed they were inferior on 
the strongly verbal items (vocabulary, definitions of abstract words, 
Minkus Completion, dissected sentences) ; but they were superior in mem- 
ory for designs, sentence memory, and constructing bead-chain from 
memory. Mellone (53), using a verbal group, a nonverbal group, and an 
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individual mechanical reading test with children 8-11 years old, found that 
the verbal-test IQ average was somewhat lower than the nonverbal, tho not 
in the above 914 years of age. 

Spache (67), who compared the Kuhlmann-Anderson results with the 
language and nonlanguage sections of the California Test of Mental Ma- 
turity, concluded that the Kuhlmann-Anderson underestimates the intelli- 
gence of children who are handicapped in reading or who are bilingual. 
Smith (66), studying groups of Japanese, Korean, Chinese, Hawaiian, and 
Caucasian college students in Hawaii, found that a bilingual background 
affected performance on the college aptitude test more than it did actual 
college achievement. 

Intelligence and social development—Lurie and collaborators (46, 47) 
studied the relationship between IQ and social quotient (SQ). In a group 
of 140 boys and girls referred to a guidance clinic, the behavior of 31 
percent was in accord with both IQ and SQ; in 36 percent behavior was in 
accord with neither; 13 percent behaved in accord with IQ alone; and 1° 
percent with SQ alone. It is thus clear in this report that neither mental 
age nor social age furnishes a basis for forecasting the level of behavior 
which a “problem child” may present. When IQs and SQs were compared, 
it was found that the lowest mental group, with mean IQ of 40, had a mean 
SQ which was 26 percent higher; while the highest mental group, mean 
IQ of 140, had a mean SQ 31 percent lower. Thus the Stanford-Binet and 
the Vineland Social Maturity Scale appear to measure some different 
factors and to supplement each other. The results also suggest that mentally 
retarded children tend to mature socially beyond their intelligence level, 
while the social maturation of the mentally very superior does not keep 
pace with their intelligence. 

Bonney (12) found that “social acceptance” ratings were poorly cor- 
related with IQs, three different rs being .32, .34, and .31. Nor were ratings 
on mutual friendships significantly correlated with IQ, namely, .19, .34, 
and .20. The same author (11) reported that the relationship of degrees 
of attraction and rejection with IQ was low. Apparently, brightness is not 
a guarantee of social competence in the earlier school grades. 

Various—O’Hanlon (59) found a correlation of —.21 between IQ and 
. size of family in a slum clearance area. He reported, also, that the better 
nourished children tend to be more intelligent, while intelligence is little, 
if at all, affected by congestion, per se, in the home. Forlano and Ehrlich 
(26) reported on the relation of season of birth to intelligence. Their re- 
sults conform to those published in earlier studies, namely, that the differ- 
ences reported as statistically significant are of no psychological import. 
Katz (36) reported on intelligence as related to height and weight of chil- 
dren of American born parents, North European stock, from favored socio- 
economic groups. The results were obtained in a longitudinal study of 112 
boys and 117 girls. For the boys, measured between ages three to five years. 
the correlations between height and IQs (medians of 5 examinations) were 
virtually zero; for girls, the same correlations were .36-.40. In the case of 
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weight and intelligence, the rs were zero for boys and .27—.34 for girls. No 
satisfactory explanation could be found to account for these sex differences. 
Boynton (13) found, with sixth-grade children, that certain hobbies (such 
as collecting, reading history and science, playing musical instruments) 
were more likely to be associated with higher intelligence; but no hobby 
was consistently associated with lower intelligence. Also, pronounced sex 
differences existed. The sex differences, we may assume, are culturally 
determined. Bennett (7) made an elaborate analysis of the influence of 
rate of work on group and individual tests of intelligence. The time spent 
on each item was recorded. The correlations between the rate of “success- 
ful work” and attitude were consistently positive but negligible, being 
between about .10 and .22. The author concluded that the influence of time 
restrictions is fairly negligible. Tho the methods employed in this study 
were different from those of earlier studies, the general conclusions are in 
essential agreement. 


Summary 


For the greatest part, the published papers have dealt with familiar 
problems, adding further confirmatory evidence. It is amply clear by now 
that most intelligence tests are useful in predicting educational achieve- 
ment, but that in themselves they are not sufficient. In this area, the Thur- 
stone tests of “primary abilities” have been subjected to particular scrutiny, 
and their value is left very much in doubt. The reports on clinical use of 
tests also lead to the already well-recognized conclusion that no single test, 
indeed no group of tests, even good ones, is adequate; for elusive, non- 
measurable, yet extremely important human factors which profoundly 
affect behavior, including learning, have not thus far yielded to psycho- 
metries. The problem of “constancy” remains where it was three and more 
years ago, and will probably remain there unless the nature of mental 
development and activity changes, or until improved tests demonstrate 
that mental development and activity are not what we believe them to be 
today. No unexpected data were reported for exceptional groups. Investi- 
gations of other specialized problems—such as rate of work, physique and 
intellect, reading retardation, social acceptance, and friendship of pupils 
—tho of interest, have but contributed further evidence in support of 
earlier findings. 

Studies dealing with environmental factors—tho relatively few in this 
period—showed once again that there is a significant relationship between 
intelligence levels and environmental quality. The recent data are open to 
the same debate on nature-nurture as earlier data have been. Whatever the 
reason, however, it appears that appreciable improvement in important 
aspects of environment will result, in general, in improved intelligence test 
results. The social and educational implications of this are clear. As yet, 
no definitive and entirely satisfactory study of the problem has been made; 
and perhaps none will be made, due to the difficulties inherent in any at- 
tempt to make a sufficiently prolonged longitudinal study of mental devel- 
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opment. Yet a good, close approximation to a definitive study could be 
made if individuals and foundations pooled their energies and financial 
resources now going into piecemeal investigations, too many of which 
excuse their results by saying they are “exploratory” and “tentative.” The 
conclusion we can now coine to, at any rate, is this: since a large company 
of educators, psychologists, sociologists, and biologists holds that environ- 
mental forces are very significant in all aspects of human development, we 
as a nation cannot afford to neglect the environmental conditions of any 
part of our society if we are concerned with optimal development of our 
people and with fostering human resources wherever found. 
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CHAPTER IV 


Measurement and Prediction of Special Abilities 


SAUL B. SELLS 


Tae SELECTION of contributions to the literature for this summary was 
limited by the exclusion of all material published outside of the United 
States and of original contributions by the armed services. Many out- 
standing developments in the measurement and prediction of special 
abilities have been reported in British, Russian, French, German, and 
other foreign journals. Military and naval contributions are covered 
in another chapter of this issue. The final selection favored original 
contributions over reviews and general discussions. 


Criteria for Aptitude Measurement and Prediction 


Bellows (7) made a comprehensive analysis of the fallibility of criteria 
used in validation and standardization of aptitude tests. He discussed 
three frequent sources of contamination in empirical criteria: illicit use 
of predictor information, artificial limitation of production, differential 
influence of experience, and six specific factors affecting them: statistical 
reliability, correlation with other criteria, predictability, acceptability 
to the job analyst, acceptability to the sponsor of the study, and produc- 
tion of a practical change in the situation by use of the derived instru- 
ment. Solomon (112) enumerated as fundamentals of selection testing 
the following: tests should be restandardized for particular situations, 
batteries of tests predict better than single tests, the pattern of his test 
scores must be considered for better prediction and improvement of the 
individual, validity and reliability of a test should be based on actual 


performance on a job. 
Scientific Aptitude 


Benton and Perry (11) obtained correlations of .30 and .37 between 
scores on the Standford Scientific Aptitude Test (Zyve) and four-year 
college grades for forty-three students. Marshall (82) tested forty-seven 
college students and obtained the following correlations with the Stan- 
ford Scientific Aptitude Test: freshman and sophomore science grades, 
40; the Moss Medical Aptitude Test, .9; biology grades, .52; physics 
grades, .42; and chemistry grades, .36. MacPhail and Foster (80) ex- 
tended an earlier study of predictive indexes of grades in elementary 
college chemistry. For eighty students they obtained a multiple R of .76, 
using Iowa Chemistry Training scores, high-school rank on a sigma 
scale, and Cooperative General Mathematics scores as predictors. Stuit 
and Lapp (116) found ability in mathematics most closely related to 
college physics achievement. The Iowa Physics Aptitude Test and the 
Iowa Mathematics Aptitude Test predicted this criterion with a fairly 
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high degree of accuracy. Spatial relations, as measured by the Minnesota 
Paper Form Board, and understanding of mechanical movements, as 
measured by the Thurstone Primary Mental Abilities Tests, did not bear 
a close relationship to college physics grades. 


Prediction of Professional Success 


Professional organizations and universities have shown increased in- 
terest in the development of selective standards of admission to the 
preparatory stages of professional training. Kandel (61) surveyed the 
efforts that have been made in medicine, law, and engineering. Medical 
aptitude tests have been rather universally adopted and are the only 
ones which have achieved any degree of satisfactory standardization. 
Kandel discussed the limitations of predictions based solely on tests. 


Business Administration 


Douglass and Maaske (33), in one of a series of University of Min- 
nesota studies in prediction of academic success in professional schools, 
found the adjusted prebusiness-school honor point ratio the best pre- 
dictor of first-year and second-year honor point ratios. A modification 
of Wesley’s College Test of Social Terms and the business Mathematics 
Test ranked next in predictive value. The multiple R of these three meas- 
ures with first-year henor point ratio was .765. Students entering the 
School of Business Administration below the age of 20.5 succeeded 
better than older students. 

Dentistry 


Douglass and McCullough (35) found predental honor point ratio 
the best single predictor of first-year dentistry status. This measure, 
together with vocabulary scores of the Iowa Dental Qualifying Examina- 
tion, Mechanical Judgment B (same test), and the Minnesota Metal 
Filing Test yielded a multiple R of .652 with first-year dental honor point 
ratio, based on ninety-five students. Robinson and Bellows (101) found 
positive relationships between success in certain courses in the first 
two years in dental school and several subtests of the MacQuarrie Test 
for Mechanical Ability, Finger Dexterity test, Cooperative Chemistry 
and Zoology tests, Strong interest scores, predental honor point ratio, 
and number of semester hours credited in chemistry. Thompson (120) 
obtained significant differences between dental students (35 freshmen and 
40 seniors at Ohio State) and fine art students (50 students) on a battery 
of motor and mechanical tests. 


Engineering and Defense Training 


Brush (21) studied the relation of measures of mechanical aptitude to 
grades in engineering courses. The Minnesota, Cox, and MacQuarrie tests, 
the O’Connor work samples, and the Stenquist picture tests all appeared 
of slight predictive value. First-semester and first-year grades were more 








Bo teste cy 
SONS Se MTR 


Review oF EpucaATIONAL RESEARCH Vol. XIV, No. 1 





closely related to the total engineering college record than any combina- 
tion of tests. Mercer (89) obtained a multiple R of .708 between a battery 
consisting of number completion, English usage, science information and 
arithmetic problems tests, the MacQuarrie Block test, the Thurstone-Jones 
Sketching Test, the Detroit Pulleys test, and freshmen engineering grades. 
McGehee and Moffie (86), who tested students at North Carolina State 
College enrolled in Engineering, Science, and Management Defense Train- 
ing Courses, reported various significant relationships. 


Law 


Douglass, Luker, and Lovegren (34) found prelegal college work the 
best predictor of Law School success. This measure correlated .50 with 
first-year Law School average grades, and .49 with three-year averages. 
Gaudet and Riker (43) found that the odd-even reliability coefficient of 
the Ferson-Stoddard Law Aptitude Examination, corrected by the Spear- 
man-Brown formula, was .97. Riker and Gaudet (100) administered the 
Ferson-Stoddard examination, Dearborn group test, Otis Self-Administer- 
ing test, and Inglis English Vocabulary test to 180 law students. The 
Ferson-Stoddard test had the highest correlation with law school 
grades, .34. Welker and Harrell (125) administered the Ferson-Stoddard 
and the Yale Legal Aptitude Test, with the American Council Psychologi- 
cal Examination, and the comprehension and speed tests of the Minnesota 
Reading Examination to 133 law freshmen. Correlations with average 
first-semester law grades and grades for five first-semester courses indi- 
cated that prelaw grades were better predictors than the tests, while tests 
of reasoning are superior to memory tests. 


Medicine 


Douglass (31) obtained the best predictor combination for medical 
school success (multiple R of .66) with premedical honor point ratio 
(corrected) and certain sections of the Minnesota Medical Aptitude Test. 
Herrmann (58) found that achievement in the Woman’s Medical College 
varies directly with standing on the (Moss) Medical Aptitude Test, and 
that the median appeared to be the critical score. Moss (92) reported an 
analysis of Form 13 of his Medical Aptitude Test, which showed a progres- 
sive increase in the number of failures in freshman medical year as one 
goes down the decile groups on the test, beginning with 1 percent for the 
highest decile of the test scores and amounting to 18 percent for the lowest 
decile. Stuit (115) found that liberal arts grade point averages were best 
predictors of first-year medical success, while the Moss Medical Aptitude 
Test did not predict with high precision. 


Nursing 


For predicting success in the school of nursing, Douglass and Merril! 
(32) found the best four-factor combination to be: high-school percentile 


rank, Moss Nursing Aptitude Test score, Cooperative General Science Test 
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Score (Part I), and score on the Douglass-Gordon Fraction Test (multiple 
R of .77). The multiple R for the first two factors alone was .75. 


Teaching 
Ryans (104) examined the scores of 4718 candidates on the second 
annual edition of the National Teacher Examinations. He found that 
extended teaching experience and possession of higher degrees gave a 
slight advantage. Intercorrelations of part scores indicated that verbal 


ability, nonverbal reasoning ability, and awareness of social problems are 
probably involved as common factors. 


Artistic Aptitude 


Beckham (6) gave a series of tests on fundamental abilities in visual art 
to three groups of Negro pupils. His results showed that intelligence is a 
factor in many art test items, that there were no significant age differences, 
and that sex differences were weighted in favor of boys, especially on 
line drawing. The Meier Art Test, I, Art Judgment (87, 88), is the 
successor to the Meier-Seashore Art Judgment Test. The 100 most dis- 
criminating items of the original 125 have been retained. Administration 
and scoring have been simplified, and the old norms are applicable. 
Mitchell (90) developed a Drawing Aptitude Test to measure visualizing 
ability, three-dimensional thinking, and skill in using a drawing pencil. 
It correlated .86 with mechanical drawing grades for 528 students in 
seventh to ten. grades. 


Music Aptitude 


Karlin (62, 63) reported two studies of music tests by factor analysis. 
Three factors were identified: pitch or tonal sensitivity, retentivity 
(memory factor for recalling isolated musical elements), and memory for 
form (or for remembering musical passages as a whole). Gross and Sea- 
shore (50) compared ten poor students of musical composition with ten 
good students and ten well-known American composers on a vocabulary 
test, a temperament scale (Humm-Wadsworth), and the Seashore music 
talent tests. 

Bienstock (14), Gilbert (47, 48), Taylor (117), and Woods and 
Martin (131) studied the Kwalwasser-Dykema tests. Bienstock found these 
tests too unreliable for individual prediction of grades in music courses. 
Gilbert analyzed differences between trained and untrained persons on 
the K-D tests, which are related to sex differences. In his first paper he 
presented revised norms (47). Taylor’s results, using the Seashore tests 
and her original Measures of Musical Background, as well as the K-D 
tests, agreed with those of Bienstock. Woods and Martin reported that 
Negro children are superior to white children in musical ability, girls 
are superior to boys (cf. Gilbert), and that type of community has a 
bearing on music talent scores. 
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Lewis (75) gave a description of the timbre test which was substituted 
for the consonance test in the revised edition of the Seashore Measures 
of Musical Talent. Saetveit, Lewis, and Seashore (106) made a complete 
report on the 1939 revision of this battery. Revisions were based on 
testing of fifth- to eighth-grade children and adults. Madison (81) 
developed an original phonograph-record test of ability to discriminate 
between musical intervals. The reliability coefficients ranged around .74, 
while the test correlated between .39 and .71 with indexes of musical 
ability at the secondary-school level and between .46 and .72 with grades 
in theory in a college of music. The test distinguished members of 
choruses from random student groups, and was claimed by the author to 
measure an ability fundamental to tonal relationships. Watkins (124) 


developed a musical aptitude test for the cornet with reported reliabil'ty 
of .95. 


Visual Acuity 


Luckiesh and Moss (79) and Martin (83) described new apparatus 
for illuminating Snellen type charts. Hardy (53) presented a comprehen- 
sive discussion of tests for measuring the following seeing functions: light 
perception, perception of differences in light intensity and quality, per- 
ception of objects and form, depth perception, ability to see quickly, 
ability to see peripherally. Ferree and Rand (39) discussed the projec- 
tion method of testing acuity. This method has many advantages for the 
refractionist; but the increasing dark adaptation during the test period, 
the wide pupil and marked brightness contrast, make the method less 
valid for testing and rating acuity of vision. Ludvigh (78), in a study of 
reduced contrast on visual acuity, measured by Snellen test letters, found 
that reduction of light difference sensitivity to one-fifteenth normal would 
not reduce acuity below 20/40. On the usual Snellen charts contrast is 
approximately 93 percent, and the frequent small variations from. th’ 
amount are practically unimportant. Ferree and Rand (40) criticized the 
errors inherent in the capital letter acuity charts and presented an improved 
chart, based on Landolt’s broken circles. Betts and Ayers (13) presented 
new data on the reliability of readings taken on the ametropia slides of the 
Visual Sensation and Perception Tests of the Betts-Ready-to-Read Battery. 
Oak (97) and Sloane (110) presented data on the Massachusetts Vision 
Test for school children. Of 161 children passed by an ophthalmologist, 138 
passed the test; of 87 failed by the ophthalmologist, 81 failed the test. 
The test consists of a Snellen acuity test, a plus 1.50 sphere test (hy- 
peropia), and a muscle balance test (heterophoria, Maddox rod). The 
test kit provides for standard illumination of about 16 footcandles. 
Eames (36) presented a further description of the Eames Eye Test. 
Triggs and Sandt (122) found the Betts Ophthalmic Telebinocular Test 
superior to the Snellen Chart as a measure of visual acuity at the 
college level. The test of vertical imbalance, near point (Betts) was of 
questionable value, while the Betts tests of visual acuity for right eye 
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and left eye were most valuable. Snell (111) summarized the current 
status of standards of visual acuity for industry. In general, 20/40 acuity 
is the beginning of visual inefficiency, and 20/100 is the dividing line 
between serious inefficiency and total incapacity. Common labor may be 
aptly performed with an acuity of 20/200, semiskilled with 20/33 to 
20/65; but for highly skilled industrial work a binocular minimum of 
20/30 is necessary. 





Color Vision Tests 


Murray (93, 94, 96) contributed a number of able critical reviews of 
color vision testing. To meet shortcomings in current clinical tests, she 
recommended that test patterns be extended in two directions: first, into 
more saturated color regions, in order to isolate the completely red- 
green blind; and second, into weaker chromas, to detect borderline 
cases of high or low sensitivity to single critical colors. She developed 
} sample tests, such as hue threshold charts, hue discrimination tests, and 

hue identification tests to attack these problems. Murray has contributed 
to the work of the Intersociety Color Council, as has Dimmick (28), who 
described the I. S. C. C. color aptitude test. It is designed to bring 
out degrees of facility in discriminating small color differences by 
normal subjects. It is concerned more with saturation judgments and 
less with hue. Hence it has much importance for industrial use in 
selection of personnel for such work as textile dyeing. This matching test, 
using two sets of color chips, employs Munsell colors mounted on a 
neutral gray background. A short form has been developed, using coarser 
intervals, which is suitable for detecting color blindness in school children. 
Sachs (105) pointed out the precautions which are necessary in the 
application of the projection method of group testing color vision 
deficiencies, developed by Berens and Stein (12). The American Optical 
r Company published an American successor to the Ishihara and Stilling 

tests, entitled “Pseudo-Isochromatic Plates for Testing Color Percep- 
tion” (3). These were criticized by Murray (94, 96) and by Gallagher, 
Gallagher, and Sloane (41, 42). The latter developed revised ways of ad- 
ministering and scoring the plates. While no data on validity have 
been published, it is fair to note that the publisher makes no claim that 
the test will accomplish some things that its critics say it cannot do. 
: Loken (76) developed the color-meter, a quantitative test based on an 
adaptation of the Rayleigh Equation apparatus, which makes use of the 
phenomena resulting from mixtures of colored lights. Of 109 subjects 
tested, none making five or fewer errors on the American Optical test 
made a score on the color-meter or the Revised Nela Test, which exceeded 
the normal. Loken (77) also published a report on his revision of the 
; Nela Test of Color Vision, a test composed of twenty-four triplets of colored 
j yarn, in which the subject is required to indicate which one of the outer 
colors matches the center one. This report claimed that for some subjects, 
color vision scores were improved by vitamin A dosage. These findings 
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were attacked by Murray (95). A new test for color blindness by Wilt- 
berger (129) is based on the discrimination of negative after-images, 
which are compared with color chips. The plates are made of nonfading 
color chips of high chroma, high value, and accurate hue. The objectivity 
of this method is offset by the fact that scoring is entirely subjective. 


Auditory Tests 


Beasley (5) found that the correlation between hearing losses by air 
conduction is extremely high and regression is linear for the four tones 
64, 128, 256, and 512 cycles. For both screening and clinical types of 
audiometers, any one of these four tones will provide approximately 
equal predictive value as to acuity of hearing thruout this range. A 
tone in the region of 2000 cycles is recommended for prediction of abilit, 
to hear speech. In hearing tests designed to discover early stages of 
high tone loss, more than two tones should be provided about 2048 
cycles. West (126) found the “numbers” group test of hearing acuit) 
inadequate. The reliability coefficient, based on 10,000 test scores, was 
low, and the correlation of this test (for 1000 subjects with low scores) 
with the 2A audiometer was .38. He recommended that group tests, using 
pure tones, be devised. Karlin (64) made a factorial study of 200 
high-school pupils’ scores on a battery of tests measuring pitch, loudness. 
timbre, time, auditory analysis, synthesis, and memory. He found no 
general auditory factor, but nine group factors, identified as various 
auditory functions. Steinberg, Montgomery, and Gardner (114) pub- 
lished population averages of hearing acuity and hearing loss, in relation 
to age and sex, based on tests given at the New York and San Francisc« 
World’s Fairs. Voelker (123) summarized recent audiometric studies. 
and reported that there is an increase in auditory acuity up to adolescence. 
followed by a decline correlated with age. He suggested certain factors 
other than age which may be responsible for this decline. Kobrak (65) 
described a test, consisting of a graduated series of tuning forks, each of 
which can be heard by the normal ear for a given number of seconds. The 
curve of successive thresholds can be used to predict progressive hearing 
disabilitv. He made several suggestions concerning technic in audiometer 
testing. Bloomer (15) suggested the technic of attributing the sounds of 
the audiometer to animal pictures, and having the child respond by 
touching the animal’s mouth to stop, as a method of testing the hearing 
of young children. Westlake’s (127) results with 875 children, equal) 
distributed by one-year groups from three to eight years, indicated that the 
present accepted threshold norm on the 2A audiometer is adequate for the 
younger children. Greater variation among the younger children was 
attributed to factors other than auditory acuity. 


Mechanical and Manual Dexterity Tests 


Bennett and Cruikshank (8) summarized and reviewed critically 
twenty-one tests and discussed the problems of manual and mechanical 
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ability testing in relation to vocational selection. The same authors (9) 
reported significant sex differences, in favor of boys, on Bennett’s Test 
of Mechanical Comprehension, Form AA. Churchill and others (24) 
found that intensive training in mechanical courses significantly in- 
creases scores on a mechanical aptitude test (Surface Development Test) . 
These findings contradict previous findings of Faubion, Cleveland, and 
Harrell (38), whose subjects had received less intensive training. Cook 
and Barre (25) published industrial norms, based on 468 male and 
2007 female applicants for the Minnesota Rate of Manipulation Tests. 
Hanman (52) tested 785 men, ranging from twenty to sixty-five years, 
widely scattered in educational attainment, with the Minnesota Paper 
Formboard, Series AA, and the O’Rourke Mechanical Aptitude Test, 
Form C. The distribution of O’Rourke scores corresponded well with 
published norms, but scores on the formboard were generally lower 
than the published norms. Schultz (108) published a simplified revision 
of the Minnesota Paper Formboard, suitable for industrial use. Reed 
(99) reported the construction of a four-block test of mechanical ability 
with a retest reliability of .82. Crawford (26) designed a formboard 
assembly of nine variously shaped flat blocks in circular pattern to 
measure ability to perceive spatial relationships. Correlation of two 
forms was .89, while validity coefficients with shop and mathematics 
grades ranged from .59 to .91. Cardall (22) and Rusmore (103) de- 
veloped new pegboard tests of manipulated dexterity. Estes (37) studied 
the intercorrelations of five spatial relations tests (block design—Belle- 
vue Intelligence Scale, Carl Hollow Square Scale, Crawford-Structural 
Visualization Test, and revised Minnesota Paper Formboard, AA) for 
seventy-six freshmen engineering students. His results indicated clearly 
a single common factor. Scores on spatial relations tests predicted suc- 
cess in descriptive geometry course; however, three-dimensional test 
material was not more valid than two-dimensional material. Teegarden 
(118, 119) published percentile norms for white male and female 
public employment office applicants, age sixteen to twenty-five, on the 
Kent-Shakow formboard, the Minnesota Spatial Relations Test, the 


Minnesota Rate of Manipulation Test, and the Cincinnati Plier Dexterity 
Test. 


Tests of Gross Motor Abilities 


Seashore (109), using a wide variety of specific measures, including 
steadiness tests, measures of balance, and athletic tasks, found no positive 
or over-all dependence or intercorrelation of fine and gross motor abili- 
ties. Larson (69) administered a battery of gross motor ability tests to 
140 Springfield College freshmen. By means of factor analysis, he de- 
scribed the primary components of motor skill as dynamic strength, static 
dynamometer strength, gross body coordination and agility, motor educa- 
bility, motor explosiveness, and abdominal strength. Winograd (130) 
attempted to establish objective criteria for selecting players who would 
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excel in batting ability. The Keystone ophthalmic telebinocular and a 
timing instrument by Kelley were used to measure vision and timing; 
batting averages and other statistics to measure batting skill. Some sub- 
tests, such as directed timing, far-point lateral imbalance, and simultane- 
ous vision, distinguished reliably between varsity players and rejected 
candidates for the varsity. However, such measures are practically of 
value only when used in connection with the other criteria of batting 
ability. 

Wetzel (128) published tables of physical fitness in terms of physique, 
development, and basal metabolism. These tables were used to some extent 
as criteria of motor fitness by Hall and Wittenborn (51). Brouha (20) 
developed the step test which measures physical fitness in terms of “the 
general capacity of the body, especially the cardiovascular system, to 
adapt itself to hard work and to recover from what it has done.” This test 
gives results comparable with the treadmill test, and the bicycle ergometer. 
and is based on the same physical principles. 


Clerical Aptitude Tests 


Schneidler (107) published new norms for the Minnesota Vocational 
Test for Clerical Workers, including tables of condensed grade norms. 
and decile points for girls and boys separately in Grades VIII thru XII as 
well as age norms for test I and II. Jurgensen (60) published a new test 
for selecting and training industrial typists. It contains a typewriting abil- 
ity analysis which was standardized on 381 applicants for industrial second 
year of typing and validated on the performance of sixty-seven employees. 
Time, error, and combined scores are obtained. Ghiselli (44) compared 
the Minnesota Vocational Test for Clerical Workers with the general cleri- 
cal battery of the United States Employment Service. Thru an analysis of 
the scores of 562 clerical workers he concluded that only a slight increase 
in efficiency could be obtained thru the addition of other tests to the Min- 
nesota Clerical Test. Crissy and Wantman (27) described the measuring 
procedures used in connection with the National Clerical Ability Testing 
Program. The basic battery of machine-scored miniature tests of objective 
type measures abilities in the areas of stenography, typing, machine tran- 
scription, bookkeeping, machine calculation, and filing. Moore (91) devel- 
oped a new general clerical test which measures accuracy in spelling, sim- 
ple arithmetic, memory for all instructions, name and number checking. 
vocabulary, usage, arithmetic reasoning, and copying accuracy. Hay (56, 
57) has published several reports on the selection of office machine op- 
erators and clerical workers with the aid of test batteries. 


Vocational Selection 


The results of several studies suggest that tests may well be used in 
connection with other measures, that the definition and selection of ade- 
quate and appropriate criteria is one of the most difficult problems, and 
that tests can be used more successfully when they are specifically adapted 
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to the particular job. Ayers (4) found that tests of specific visual abilities 
are of value in appraising capacity to inspect textiles. Specific vision tests 
cited in this connection were diagnostic stereopsis, ametropia, visual acuity, 
accommodation, binocular fusion, near blur points, vertical phoria, and 
lateral phoria. Blum (16) studied prediction of success of employees in 
a watch factory. Criteria of proficiency were length of employment, sal- 
ary ratio, and foremen’s ratings. Predictions based on a combination of 
O'Connor finger and tweezer dexterity test had a higher correlation with 
criteria than interviews alone. Ghiselli (46) and Blum and Candee (18, 
19) confirmed each other’s results which showed that scores on the O’Con- 
nor Finger Dexterity and Minnesota Placing Tests have no value for the 
prediction of success of department store packers and wrappers. Blum and 
Candee found, however, that the Minnesota Clerical Test gave a fairly good 
prediction. Blum (17) developed a sewing machine test for predicting sew- 
ing machine operators for a glove manufacturer. He studied this test 
and the MacQuarrie Test for Mechanical Ability, with supervisors’ ratings 
and earnings as criteria. Critical scores were obtained for both tests which 
differentiated between significant proportions of “good” and “poor.” Ben- 
nett and Fear (10) obtained satisfactory predictions for machine opera- 
tion jobs using the Bennett Mechanical Comprehension Test and the Hand 
Tool Dexterity Test. Tiffin and Rogers (121) selected tin-plate inspectors, 
obtaining positive results with vision tests administered stereoscopically, 
including visual discrimination for distance and near, and vertical bal- 
ance. Ghiselli (45) used the Minnesota Paper Formboard and Pegboard, 
and the MacQuarrie Copying and Dotting Tests, to predict successful in- 
spector-packers (in a pharmaceutical supply house) with a multiple R of 
.72. Gottsdanker (49) and Koran (66) described a number of tests for the 
selection of calculating machine and office machine operators. 

Kornhauser and Schultz (67) summarized research on the selection of 
salesmen. They pointed out that effective selection procedures must be 
worked out in relation to the particular type of selling. Tests may con- 
tribute useful practical improvements in methods, and they may consider- 
ably increase the probability of selecting successful salesmen. Kurtz (68) 
reported that a combination of personal history items and Kornhauser’s 
Test of Personality Characteristics gave a good prediction of life insurance 
sales production. Ohmann (98) developed a rating form for use in a con- 
trolled interview employed in selection of salesmen. He also developed a 
test consisting of three parts: difficult sales situations, verbal and pictorial 
descriptions of building maintenance problems, and arithmetical calcula- 
tions. Dougan and Gory (30) described a study by the Cincinnati Person- 
nel Department of the application of the merit system to unskilled labor. 
The department devised practical demonstration tests for waste collection 
helpers and garagemen, and a labor adaptability test for street cleaners. 
Dorcus (29) in a review of methods of evaluating the efficiency of door-to- 
door salesmen of bakery products presented a comprehensive discussion 
of the problem of validating criteria for such selection. Lawshe (72, 73) 
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developed a test to aid in selecting persons for an industrial training pro- 
gram. Studies of industrial trainees were reported by Ross (102). Harrell! 
and Faubion (54, 55) reported the results of selection tests for aviation 
mechanics in the Army Air Corps Technical Schools. These included a 
number of standardized mechanical and special abilities tests. 
Driving and Flying Aptitudes 

Driving—Allgaier (1,2) found parallel parking in twenty feet to be a 
differentiating test of driving ability. Tests of distance judgment, resistance 
to glare, and hand steadiness ranged next in discriminatory value. Lauer 
(70) reported that driving aptitude can be determined by systematic ob- 
servation. He recommended that licensing bureaus give learners’ permits 
at fifteen years of age, junior licenses at sixteen, and senior licenses at 
eighteen, and that they cooperate more closely with schools that offer 
driving instruction. Lauer and Allgaier (71) reported the results of a 
series of tests on professional and nonprofessional drivers. No signifi- 
cant differences were shown altho the complete results were considered 
useful as guides in remedial programs. An important study was made by 
the United States Public Health Service on fatigue and hours of service of 
interstate truck drivers. The following chapters report significant informa- 
tion in relation to aptitude testing: Channell (23), on driving and glare 
tests; Lee (74), on critical fusion frequency of flicker; Specht (113), on 
eye movements and related phenomena, using a modification of the ophthal- 
mograph; and Wulfeck (132), on psychomotor reactions. 

Aviation—Jenkins (59) reviewed the work of the Committee on Selec- 
tion and Training of Aircraft Pilots of the National Research Counc’! 
with respect to criteria and standards of performance, selection and 
classification of pilots, improvement of methods of instruction, response to 
stress and strain. Matheny (84) had fifty-five college students in the civilian 
pilot training program rated as to final flying ability by the flight opera. 
tor and the flight examiner. He correlated their scores on the Kelley 
spatial insight test and the Henmon-Nelson Test of Mental Ability, together 
with a work load factor and a rating for interest in aviation with their 
flying ability ratings, and obtained a multiple R of .7. Matheny (85) ob- 
tained a low correlation of .13 between test results and ratings later as- 
signed by the flight operator and flight examiner. He attributed these re- 
sults to the homogeneity of the group, to a ceiling effect, and to variance 
in motivation. 
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CHAPTER V 


Current Construction and Evaluation of Personality 
and Character Tests 


ARTHUR E. TRAXLER 


Tins CHAPTER presents an overview of the new personality tests published 
during the period from July 1940 to July 1943, and summarizes the re- 
search appraising personality tests in general, except those belonging under 
the heading of projective technics. Chapter VII is concerned with pro- 
jective methods in the study of personality. Research on tests of interests 
and attitudes will be included in this chapter. 

Summaries and bibliographies—In the last number of the Review deal- 
ing with “Psychological Tests and Their Uses,” published in February 
1941, the writer presented a summary and bibliography of personality and 
character tests covering a three-year period. The bibliography included 
157 titles. In the same number of the Review, Rothney and Roens published 
a three-year summary of studies concerned with applications of personality 
and character measurement, including a bibliography of 103 titles. Snyder 
(77) made a survey of studies in the measurement of personality attitudes, 
and interests of adolescents covering a five-year period. Super (79) re- 
viewed 147 research articles dealing with the Bernreuter Personality In- 
ventory. Duffy (20) reviewed critically the investigations in which the 


Allport-Vernon Study of Values Test or other tests of evaluative attitude 
had been employed. 


New and Revised Tests 


Tiegs, Clark, and Thorpe (83) have recently made available four addi- 
tional series of the California Test of Personality:—Primary Series for 
Grades I-III, an Intermediate Series for Grades VII-X, a Secondary Series 
for Grades IX-XIV, and an Adult Series for Grade VII to the adult level. 
Each battery has two main parts designed to measure self-adjustment and 
social adjustment. Within each part are several subtests the results of 
which may be graphed to form a personality profile. The test is adapted 
for machine scoring. The split-half reliabilities of each of the two main 
parts and of the total adjustment score are in the neighborhood of .9. The 
purpose, development, validity, reliability, and interpretation of scores on 
this test have been discussed by Tiegs, Clark, and Thorpe (84) and the 
nature of the test has also been outlined in another publication by Tiegs 
(82). 

In 1941, Science Research Associates published The Personal Audit, by 
Adams and Lepley (2). The test is designed to measure nine aspects of 
personality: sociability or extroversion, suggestibility, irritability, ration- 
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alization or alibi tendency, anxiety or fear tendency, sexual emotional con- 
flict, personal tolerance, flexibility of attitudes, and thought intensity or 
worry over unsolved problems. There are two forms—S, consisting of the 
first six parts, and L, consisting of all nine parts. The Spearman-Brown re- 
liabilities of all parts are reported as .90 or above, and the intercorrela- 
tions of the parts have been found to be low. A description and appraisal 
of this test was published by Adams (1). A study of the interrelationships 
between the Adams-Lepley Personal Audit and the Bernreuter Personal.- 
ity Inventory was reported by Tubbs (87), whose results tended to sub- 
stantiate those of the test authors. 

The Minnesota Multiphasic Personality Inventory, by Hathaway and 
McKinley (43), was published in 1943. It is a revision of the Minnesota 
Multiphasic Personality Schedule, which was made available by the same 
authors in 1942. The revised form has scoring keys for hypochondriasis. 
depression, hysteria, psychopathic personality, masculinity-femininity. 
paranoia, psychasthenia, and schizophrenia. The Inventory, suitable for 
subjects over sixteen years of age who are able to read, is intended to he 
a psychiatric measuring device for general medical practice. The con- 
struction and evaluation of the Schedule were set forth in several ar- 
ticles by McKinley and Hathaway (42, 61, 62). 

In 1941, Watson and Fisher (34, 92) reported the construction of An 
Inventory of Affective Potency and An Inventory of Affective Tolerance. 
These authors understand affective potency to have three aspects— 
strength, duration, and number of affective experiences. A scale of fifty-four 
items was constructed for the measurment of this dimension of emotion. 
ality. 

By affective tolerance, Fisher and Watson mean the capacity of an indi- 
vidual to cope successfully with emotional situations. They constructed a 
sixty-one item Inventory of Affective Tolerance, which was published b\ 
the Sheridan Supply Company. They reported split-half reliabilities above 
.9 for this Inventory. Two studies of the results of the Inventory of Affective 
Tolerance were reported by Watson (90, 91). 

The Minnesota Personality Scale, prepared by Darley and McNamara 
(19), was published by the Psychological Corporation in 1941. This test, 
which has separate forms for men and women, is designed to measure 
morale, social adjustment, family relations, emotionality, and economic 
conservatism. 

Guilford’s Inventory of Factors S T D C R (39), which was mentioned 
in the last number of “Psychological Tests 2nd Their Uses” and which 
was at that time printed privately by the autuor, is now available in pub- 
lished form from the Sheridan Supply Company. 

Other recently published personality inventories which have not as yet 
been extensively evaluated are the Detroit Adjustment Inventory, by Baker 
(3); the Wilson Scales of Stability and Instability (94) ; My Personality 
Growth Book for Junior and Senior High Schools, Colleges, and Adult 
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Groups (60); A Self-Rating Scale for Leadership Qualifications (8) ; the 
Nash-Hunsicker Personality Scale (47); the Social Personality Inventory 
for College Women (59); a questionnaire for appraising personal and 
social adjustment, by Sheviakov and Block (75) ; Every-Day Life, by Stott; 
and the Inventory of Social Behavior, by Weitzman. The last two inven- 
tories were briefly described by Leuenberger (54). 

Several unpublished tests of personality and character were described in 
articles published during the period under review. Raymond B. Cat- 
tell (10) discussed the nature and use of an objective test of temperament 
designed to measure quickness of decision, resourcefulness, excitability, 
restraint, and other characteristics. E. H. Hsu (45) described the Con- 
struction of a Test for Measuring Character Traits and presented the 
results of using it with juniors and seniors in a women’s college. He iden- 
tified four “superfactors” which were independent of one another and 
which resembled closely schizophrenic and manic-depressive syndromes. 
Thompson (80) described an inventory for measuring the trait of sociali- 
zation-self-seeking and studied its correlation with the Allport-Vernon 
Study of Values Test, an intelligence test, and certain scales for the Strong 
Vocational Interest Blank. Edwards (23) developed an anxiety scale for 
six broad areas, using a word list based in part on the Pressey X-O Test. 


Studies of the Bernreuter 
Personality Inventory 


Several studies were added to the extensive bibliography on the Bern- 
reuter test. Feder and Baer (29) questioned the validity of the Bernreuter 
Personality Inventory thru a study which indicated that the clinical exami- 
nation of behavior records of maladjustment did not agree with the inven- 
tory scores. 

Hampton (40) experimented with the substitution of simpler synonyms 
for certain words in the Bernreuter Inventory when it was used with retail 
grocers. The changes resulted in greater understanding and. indicated that 
revision in the vocabulary of the inventory was desirable if it were to be 
used with adults of limited education. 

Reed (70) concluded on the basis of an investigation with college fresh- 
men that the Bernreuter Inventory did not provide a sufficient basis for 
guidance in regard to choice of college courses or of a vocation, and that 
the Thurstone Vocational Interest Blank did not provide a sufficient basis 
for guidance in choice of college courses. A study by Ward and Kirk (89) 
indicated that the scores of college freshman students on the Bernreuter 
Inventory were almost unrelated to later practice teaching grades or to 
ratings by critic teachers. 

Ruch (73) found that college students were able to influence markedly 
their scores on the Bernreuter Personality Inventory, and he developed an 
honesty scale based on Bernreuter responses under two conditions. __ 
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Studies of Other 
Personality Inventories 


Ryans and Peters (74) found that the twenty-three most discriminating 
items on the Bell School Inventory measured the adjustment of college 
women as adequately as the seventy-six items in the entire test. Marsh (58) 
studied the diagnostic value of the Bell Adjustment Inventory for College 
Women. His data indicated that the results were not very sensitive to cases 
of maladjustment until they were so bad as to be considered critical. Trax- 
ler (85) obtained fairly high reliability for the scales on both Bell inven- 
tories, but he found that the correlations of the scores with teachers’ ratings 
of the same aspects of adjustment tended to be low. Clark and Smith (13) 
studied the correlations of scores on the Bell Adjustment Inventory and 
the Washburne Social Adjustment Inventory with ratings of students by 
faculty members and counselors. The results suggested that the inven- 
tories were not valid indicators of student adjustment in the school situa- 
tions at the institution where the study was made. 

Humm and Wadsworth (46) outlined some general rules for the use and 
interpretation of the Humm-Wadsworth Temperament Scale. Poole (68) 
indicated that the Humm-Wadsworth Temperament Scale was a useful 
supplement to the regular interview and selection practice among em- 
ployees at the Lockheed-Vega plant in Burbank, California. 

Beck (6) selected seventy-seven items from the Terman-Miles Mascu- 
linity-Femininity Test, and found that the scores on this short form were 
highly enough correlated with the scores on the total test to justify the 
use of the short form in making group comparisons. Revised scoring pro- 
cedures were proposed by Giffen (36) for the Pressey X-O Test. Revised 
scoring weights were developed by Congdon (15) for the Heilman Per- 
sonal Data Scale. Roslow (72) summarized the results of the Link Per- 
sonality Quotient Test and concluded that “there is a form of behavior 
represented by social cooperation and leadership which is measured by 
this test.” 

Harris (41) reported a study which indicated that the Maller Case In- 
ventory did not appear to measure those traits which are related to 
successful behavior adjustment in a state school for boys. Kuhlen (53) 
found that the Pressey Interest Attitude Test was useful as a descriptive 
measure of personality among college girls, but that its validity as an 
emotional maturity measure at that level was doubtful. 


Consistency of Responses to Personality Inventories 


The question of consistency of responses has an important bearing 
on the validity of personality inventories. Pintner and Forlano (67) in- 
vestigated the consistency of response to personality tests at different age 
levels. They found that consistency increased slightly with age, altho the 
children were not considered inconsistent. Eisenberg and Wesman (25) 
found that 85 percent of the responses to a psychoneurotic inventory were 
consistent, and that the consistent responses were usually interpreted con- 
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sistently and logically. However, in a study reported in another article 
Eisenberg (24) concluded that “our results indicate that at least for exist- 
ing questionnaires of the Yes-No type, individual variation in interpre- 
tation of items is so serious that the questionnaires cannot have much 
individual validity no matter what the r’s with criteria.” 


Factor Analysis in the Study of Personality 


Interest in the application of factor analysis to the study of personality 
was continued in the period under review. Wolfle (95) discussed the con- 
tribution of factor analysis to the isolation of important variables of 
human personality. Koch (48) factored the correlation matrix based on 
several measures of the behavior of preschool children and extracted nine 
factors, of which thize—immaturity, artistic tendency, and an unnamed 
Factor V—appeared to be virtually independent. 

Goodman (38) carried on a factor analysis of twenty personality ques- 
tions which were asked of college students and obtained three iactors, two 
of which were named “sociability” and “self-reliance.” Darley and Mc- 
Namara (18) reported the establishment of five new experimental scales 
for measuring behavior thru the factor analysis of test and retest per- 
formance on thirteen attitude and adjustment scales. Ferguson, Hum- 
phreys, and Strong (33) performed a factor analysis on the scores of 
ninety-three male college students on eight scales for the Strong Voca- 
tional Interest Blank and six scales for the Allport-Vernon Study of 
Values Test, and classified the interests into five orthogonal types. 

Altho introversion and extroversion have frequently been treated as 
unitary characteristics, Reyburn and Taylor (71), using Thurstone’s cen- 
troid method with ratings on ten characteristic traits of introversion, ob- 
tained four factors: emotional and conative stability, a factor identical 
with Cattell’s desurgency, sociability, and a factor related to perseveration. 
Evans and McConnell (27) devised a test which gave relatively inde- 
pendent measures of three types of introversion and extroversion: think- 
ing, social, and emotional. 


Various Studies of Technics of 
Personality Measurement 


Lorge and Thorndike (56) showed that the verbal replies in associa- 
tion and completion tests tend to be unrelated to the actual behavior of 
the individual. In a second article, these authors (57) found that the 
reliability of responses to a free association test was rather low. 


Interest Inventories 


Studies of Strong Vocational 
Interest Blank 


Peterson and Dunlap (66) proposed a simplified method for scoring 
the Strong Vocational Interest Blank in which all weights were reduced 
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to +1 or —1. They showed that the correlations between scores predicted 
by this method and original scores were high. Kogan and Gehlmann (49) 
carried on an independent study with the test blanks of college freshmen 
which tended to validate Peterson and Dunlap’s method. Lester and Trax- 
ler (55) applied the simplified scoring method to the interest blanks of 
two hundred secondary-school boys and obtained results closely in agree- 
ment with those which Peterson and Dunlap secured with more mature 
subjects. 

Skodak and Crissey (76) showed that the “A” ratings of high-school 
senior girls on the Strong Vocational Interest Blank for Women tended to 
be concentrated in the four occupations of stenography, office work, home- 
making, and nursing, and they pointed out that this fact limited the value 
of the test in counseling at that level. 

Tussing (88) studied the possibilities of measuring personality traits 
with the Strong Vocational Interest Blank and concluded that certain as- 
pects of personality, such as self-confidence and sociability, could be pre- 
dicted fairly accurately with certain keys for the Strong test. 

Burnham (9) investigated the stability of interests as measured by the 
Strong Vocational Interest Blank for Men. He found that occupational in- 
terest scores were more stable than college grades but less stable than 
psychological test scores. 


Studies of Other Interest 
Inventories 


Form BB of the Kuder Preference Record, published by Science Re- 
search Associates in 1942, is a revision of the form first published in 1939. 
The revised form yields a profile of preference scores in nine areas in- 
stead of seven. No studies of the revised record are available for the period 
covered by this review. 

Traxler and McCall (86) reported data for the unrevised Preference 
Record which tended to support Kuder’s results. The retest reliability of 
the scores was found to be fairly high. Patterns somewhat similar to those 
identified by Kuder were found for various vocational preference groups. 

Peters (65) computed intercorrelations among certain scales on the 
Kuder Preference Record and the Strong Vocational Interest Blank for 
Women. He found five intercorrelations that he believed should be given 
serious consideration by personnel workers and counselors. However, 
most of the correlations were fairly low. None of them was as high as .6. 
Froehlich (35) studied the Gentry Vocational Inventory and obtained re- 
sults which indicated that “the test should be more carefully standardized 
and evaluated before it is used in a counseling situation.” Glaser and 
Maller (37) discussed weaknesses in the Allport-Vernon Study of Values 
Test and proposed a substitute test, the Interest-Values Inventory, which 
they believed embodied certain improvements. 
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Relationship between Self-Estimated 
and Measured Interests 


The relationship between the interests of freshman women as measured 
by the Strong Vocational Interest Blank and their self-estimated interests 
was studied by Bedell (7), who found no close relationship. He argued 
that measured interests are more valid than self-estimates, altho the latter 
were not necessarily without significance. Crosby and Winsor (17) corre- 
lated self-estimated interests of 222 Cornell students with their scores on 
the Kuder Preference Record. The average r was .54, which was con- 
siderably higher than the average correlation obtained by Bedell for self- 
estimated interests and the Strong blank. Moffie (64) compared self- 
estimated interests with scores on the Strong Vocational Interest Blank 
for Men. He found that the scores for only one group and for only two 


specific occupations showed high enough relationships with estimated 
scores to be significant. 


Measurement of Attitudes 
Studies Based on Thurstone Technic 


Several new scales constructed according to Thurstone’s technic for 
attitude scale construction were reported during this period. Cohen (14) 
described the construction of two forms of a scale for attitude toward 
the esthetic value and presented preliminary results. Howard and Robert- 
son (44) constructed a scale for measuring the effects of science instruc- 
tion upon the scientific attitude. Dudycha (21) used the Seashore-Hevner 
modification of Thurstone’s method in the preparation of a scale em- 
bodying eight aspects of dependability. The final scale, which consisted 
of twenty-seven statements, was called “Attitude Scale for Clerical Work- 
ers. 

Ferguson (31), who had previously reported the application of factor 
analysis to six Thurstone attitude scales leading to the identification of 
two general or primary factors, which he called Religionism and Humani- 
tarianism, carried on a study in which he reisolated these two factors from 
an appropriate matrix. He concluded that the two primary factors were 
operationally stable attitudes. Ferguson (32) reported further factor 
analysis leading to the isolation of a third factor, Nationalism, and con- 
structed a scale for the measurement of the new factor. In another article, 
Ferguson (30) compared the Likert and Thurstone methods of attitude 
scale construction, and concluded that Likert’s technic did not obviate 
the need for a judging group. 

Ericksen (26) questioned the value of Peterson’s “Attitude Toward 
War Seale,” one of the more widely used scales constructed according to 
the Thurstone technic. He found that on the average the attitude scores 
of university students toward war remained practically constant between 


May 1940 and February 1941, notwithstanding the movement of world 
events during that period. 
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Fluctuations in Attitudes 


Cattell (11, 12) studied the fluctuation of attitudes as a measure of 
character integration. His results indicated that both with children and 
with adults, fluctuation tendency is a consistent trait of the individual and 
that the fluctuation measure is significantly related to the “W” factor of 
character integration and stability. His data indicated that the tendency to 
fluctuate could be measured with fair consistency by a test of sixty items. 


Other Studies of Attitude Measurement 


On the basis of Remmers’ technic of scale construction, Bateman (5) 
constructed two scales for measuring attitude toward any educational 
program, and reported the results of using them with two hundred high- 
school students. The correlation between Scale A and Scale B was .87. 
Corey (16) described a technic for deriving an attitude scale of the simple 
sort for measuring the attitudes of elementary-school children in the 
classroom, and made suggestions concerning its use. Mitchell (63) con- 
structed two scales for the measurement of attitudes of pupils toward 
education, school, and school practices, and reported the results of using 
the scales in one high school. 

Ewing (28) devised an instrument for measuring the extent to which 
various social groups in a culture are in conflict with each other, and 
reported results showing that the reliability based on two presentations 
of this instrument was high. 


Measurement of Persistence 


The data on his test of persistence previously presented by Ryans were 
reexamined by Thornton (81), who criticized Ryans’ conclusions and 
questioned the validity of his test. 

Kramer (50), using six tests of persistence, found six factors by centroid 
analysis: a will factor, stability of character, sense of inferiority and 
compensation for it, intelligence, will to community, and reliability. 
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CHAPTER VI 


Applications of Personality and Character 
Measurement 


JOHN G. DARLEY and GORDON VY. ANDERSON 


Tue PAST TRIENNIUM has witnessed considerable growth in the significance 
and sophistication of research studies involving the use of personality 
and character measurement. Of the actual research titles to be reviewed 
in the first five sections of this chapter, 14 percent are in the field of occu- 
pational studies; 19 percent show the effect of societal factors on per- 
sonality measurements; 29 percent evaluate educational problems and out- 
comes in terms of measured personality factors; 15 percent apply research 
methods to clinical problems; and 23 percent deal with causative or con- 
ditional factors in personality structure. These classifications and the as- 
signment of articles to them represent a gross index of quantity research 
within five important types of application. 


Personality Measurement in Occupational Problems 


Five studies are typical of selection and classification research in in- 
dustry. Dodge (22) used a personality inventory to differentiate the 
traits of 192 better and poorer clerical workers employed in four different 
companies representing three different industries. The group differences 
are not at the critical ratio level of 3.0 but are consistent in defining the 
better workers as more subject to worry, less emotionally reactive, less self- 
sufficient, nonsocial, not so desirous of responsibility or admiration. Cor- 
relations between the final form of his test and ratings for company sub- 
groups ranged from .30 to .64. Otis (64) found the Strong Vocational In- 
terest Inventory sales keys and personal data items to be more clearly 
related to detergent sales success than the Bernreuter Personality Inventory 
for a small group of demonstrator-salesmen. Bills (8) reported a follow-up 
of 700 casualty and life insurance salesmen in terms of their score levels 
on two sales keys of the Strong test and of their weights on personal data 
items. Using criteria of production, policy cancellations, and insurance 
school success, clear group differences are found among high- and low- 
scoring groups. Hampton (44) reviewed the personality characteristics 
found by three investigators to be related to sales success. Dominance and 
extroversion emerge as significantly related to sales success in each of the 
studies cited by Hampton. Moore (59) reported massed test data on almost 
10,000 adult students in Engineering Defense Training Courses, with a 
special reference to distributions of Strong Vocational Interest Test scores 
among occupational subgroups. These data are to be followed up and 
reported in greater detail; they represent significant material on voca- 
tional choice as determined by measured personality patterns in a rela- 
tively free condition of opportunity for training. 
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The significance of each of these studies is to be found in the consistency 
with which they show rather definite and characteristic personality pat- 
terns among occupational groups; employers, guidance workers, and 
classification officers generally can extend their selection procedures to 
encompass these patterns, as well as patterns of ability, aptitude, and 
achievement. 

Evans and Wrenn (35) related three dimensions of introversion-extrover- 
sion to the grades and practice teaching ratings of seniors in education. 
Thinking introversion accompanied by high ability tends to be related to 
higher grades, whereas social and emotional extroversion tend to be related 
to success in practice teaching. Ward, Remmers, and Schmalzried (81) 
had pupils rate their practice teachers twice as one method of improving 
teacher performance. The forty practice teachers showed a significant im- 
provement in ratings; the authors also demonstrated that students can rate 
independently of their own received grades; that pupils and supervisors 
agree closely in their ratings of practice teachers. 

Freeman and others (39) described the development of a standard “stress 
interview” as a device to test the changes in behavior of applicants for 
police jobs under normal and disruptive interview relationships. Their re- 
port is an interesting methodological summary of considerable pertinence 
to students of interview technics. 

Walker (80) reported on the use of the Terman-Miles Masculinity- 
Femininity Test in a prison classification system. While the test appears 
too cumbersome for routine use, type of offense and prison behavior show 
some relation to M-F scores. 

Super and Roper (77) set up an objective method for the measurement 
of vocational interests, based on the assumption that interest is a prime 
determinant of attention and. memory, which in turn are reflected in 
achievement testing after interest is stimulated. The technic has promise 
as a method of assaying certain levels of interest which may not be reached 
by other technics. 

Bergen (6) discussed a most crucial use of personality measurement 
in studying the attitudes of employees under wartime pressure in manu- 
facturing plants. He listed high morale conditions and conflict situations 
that have been found in such testing programs. 

These articles, covering factors of selection, classification, employee 
relations, and job success, bring to industrial fields an emphasis on per- 
sonality and interpersonal relations which has been less clear-cut in the 
past. 


Personality Measurement in Societal Problems 


The impact of societal changes and societal structures is undoubtedly 
reflected in the attitudes and personal adjustments of individuals. Therefore, 
such a research classification may highlight a field of study worthy of more 
intensive analysis. 
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Chapin’s study (15) of the social effects of good housing is an example 
of such research. Using an experimental group of former slum families 
accepted in a public-housing project and a control group of families con- 
tinuously resident in slum districts, he found significant increments in 
social participation, status, neatness, and effective use of space for the 
experimental group in comparison to the control group. However, on 
measured morale and general adjustment, neither group changed sig- 
nificantly from the test to retest period. The multi-dimensional nature of 
the criterion of social betterment programs is clearly demonstrated in 
this study. 

Mitchell (58) compared the responses of Rotary Club members, school 
teachers, and high-school students on a scale devised to measure attitudes 
toward the press. High-school students show the least favorable attitudes of 
the three groups. The scale was hastily standardized; no significant figures 
are presented; and the N’s are not reported. It is unfortunate that a good 
idea was not buttressed by adequate research treatment. Eberhart and 
Bauer (28) investigated recall of the Republic Steel Strike in Chicago, 
with special reference to the effect of the stand taken by the Chicago 
Tribune. By use of a multiple-choice set of items, attitudes were elicited 
in the guise of factual recall of the event thirty months later. Stereotypy 
of response to controversial issues was checked as a control factor by use 
of the same technics in geographical areas outside the circulation range 
of the Tribune. Testimony before the LaFollette Committee served as the 
touchstone of factual accuracy. Respondent bias was checked by use of 
an attitude scale toward labor as a potentially dominant social group. 
Labor attitudes, sources of information, and nature of recall are generally 
in agreement, with no evidence that the Chicago Tribune has a unique 
effect on recall. 

Nelson (62) contrasted students by college classes, by college types, 
by college geographical distributions, by sex, and by fathers’ occupa- 
tions in terms of their responses to the Thurstone scales relevant to religious 
observance and beliefs. Over 3500 students from eighteen colleges appear 
in the sample. All groups fall at or above the neutral points, in the direc- 
tion of favorable beliefs. Subgroup differences are all in the directions 
which would be predicted from cultural and sociological factors. A more 
penetrating study is the one by Sappenfield (70), who analyzed the effect on 
social attitudes of Protestant, Catholic, or Jewish allegiance. Additionally 
he asked each group to estimate the attitudes of the other two groups on 
various social issues, as a basis for educing evidence of stereotypy in inter- 
group relations. Liberalism, birth-control attitudes, war attitudes, and 
communism attitudes were the social issues studied. Analyses of variance, 
critical ratios, and correlational data are appropriately used in specifying 
not only intergroup and intragroup differences in self-attitudes and atti- 
tude estimates, but also in extending a research method of great promise 
in social-psychological investigations of group conflict. 
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Miller (57) reported a special analysis of morale data in the follow-up 
study of 951 former University of Minnesota students. The contribution 
to morale of impersonal factors in the structure of the environment and of 
personal factors in the individual’s reactions to his adult status are segre- 
gated, with the generalized conclusion that the morale of the college- 
trained adult is predictable from specific impersonal and personal factors. 
Age, size and regularity of income, stability and type of occupation. 
hours of work, are some of the related factors in adult morale in normal 
times. 

Cronbach (17, 18, 19) established a measure of group and individual 
optimism for the future as an index of the wartime morale of high-school] 
youth. The desirable relationship of this continuum to realistic predic- 
tions of events to follow in the train of war is one neither of overoptimism 
or overpessimism. The contrast between group and individual morale 
so defined is pointed out as a challenge to realistic teaching and guidance 
by the schools. These are some of the most stimulating of the current re- 
searches on the effect of war, altho the author sometimes goes beyond his 
data in the acceptable generalizations set forth. 

Stagner (74) reported the construction and application of a scale for 
attitude toward war, during 1937-38. Here attitudes toward war are treated 
as a special extension of the broader trait of nationalism. Among the sig- 
nificant findings were the following: military training, veteran’s group 
membership, and conservative political connection are related to greater 
militarism; labor and professional men are more pacifistic than clerical 
and business men; generally discarded beliefs about war causes are held 
only by the more militaristic men; military preparedness and neutrality 
laws were viewed as war preventives by the militaristic men. This study. 
in its emphasis on war as a special instance of a broader attitude, is a 
fundamental approach to the problem, particularly as it points out the 
personal determinants of the special attitude. 

Katzoff and Gilliland (50) reported bimodal distributions, general ods 
toward participation, and clear regional differences in monthly use of 
scale to measure attitude toward American participation in the war. This 
study was carried out prior to America’s entry into the war. Miller (56) was 
fortunate in completing repeated testings of college students before and 
after the U. S. war declaration. He found a spurt in morale immediately 
after the declaration of war, followed by successive lessening of personal 
morale. Remmers (68) set up a cross-section comparison of 1935 and 
1942 Purdue students in terms of their attitudes toward Germans, Nazis. 
Jews, and Japanese. With the exception of attitudes toward Jews, each 
group is viewed significantly less favorably over the seven-year period. 
In the case of Japanese and Nazis the decrease in popularity is accompanied 
by a significant increase in homogeneity of attitude response. This is similar 
to the evidence for stereotypy shown in Sappenfield’s study, and may well 
reflect intensive propaganda and hostility outcomes of wartime national 
attitudes. 
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Duffy (27) reported a 1935 analysis of daughter-parent response of 
thirty-eight private college students to the Thurstone war and treatment 
of criminals scales. Critical ratios show the parents to be similar on the 
average in both attitudes, but the daughters to be significantly more 
liberal than either parent in both attitudes. Correlational analysis shows, 
as usual, closer correspondence between parents, with particular reference 
to war attitudes, than between daughters and either parent. The relative ef- 
fect of parents as attitude and personality determinants still remains to be 
established in more extensive researches than this one. 

Dudycha (25) summarized studies on war attitudes, with particular 
reference to 1930-41 published reports of the Droba and Peterson Scales. 
Consistent pacifistic tendencies characterize all the studies reviewed. Tech- 
nical measurement problems are briefly discussed also. 

Space limitations preclude a review of many recent wartime morale 
studies of large scope. In general, however, the studies cited here are typice! 
of the major trends of the three years under review. 


Personality Measurement in Educational Problems 


The technics of attitude and personality measurement are increasingly 
being applied as tests of outcomes of education, with interesting results. 
Eleven research studies out of the twenty-three to be discussed use atti- 
tude or personality tests to measure outcomes of specific classroom experi- 
ences; eight more deal with the personal outcomes of over-all educational 
programs; and four are concerned with miscellaneous educational 
problems. 

The effect of college class-work in social sciences on attitudes is the topic 
of researches by Billings (7), Fitch and Remmers (38), Rackley (66), and 
Whisler (82). Studies of the same problem among high-school students 
are found in reports by Von Eschen (79), Mason (52), and Bateman and 
Remmers (5). Each of the authors reported gains in the direction of greater 
liberalism and tolerance, regardless of the measuring instrument used, 
the educational level studied, or slight modifications in teaching methods. 
From the research standpoint, the studies by Mason, and Bateman and 
Remmers are the most incisive and significant of this group. Mason first 
classified social science teachers as liberal or conservative, then equated 
the students of both types of teachers on original attitude scores, and 
finally studied the retest performance of students trained by each type of 
teacher; the pupils of liberal teachers gained in liberalism, while those 
of conservative teachers lost slightly. Differences in mean scores were 
significant. The study was paralleled in rural communities with the same 
general results. The effect of the community in choosing liberal or con- 
servative teachers was also discussed. Bateman and Remmers also departed 
from the routine design of experiments of this type by tracing attitude 
shifts thru four administrations of the same scale: before negative propa- 
ganda; after negative propaganda; after positive propaganda; and 
after an interval of two additional months. Their high-school seniors 
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shifted with each propaganda emphasis, and finally tended to return 
to their positions immediately after the negative propaganda stimulation. 
Bimodality was the characteristic distribution at each of the three retest 
periods. These two studies touch on crucial aspects of specific educational 
technics. 

Eckert (29) and Gilkinson (41) both reported significant measured per- 
sonality changes as the result of college speech courses. Both studies use 
control data. Eckert’s study involved a wider range of personality test 
variables. 

Breemes, Remmers, and Morgan (9) reported cross-section data for 1931, 
1933, 1937, and 1939 students at Purdue, showing general gains in liberal- 
ism as a composite outcome of many contributing factors. They also re- 
ported further supporting evidence for the positive relation between in- 
telligence and liberalism. Bugelski and Lester (10) reported test- 
retest increase in liberalism from freshman thru senior college years, with 
maintenance of senior liberalism in a second retesting three years later 
for some of the original group. Group differences by major subject, re- 
ligious preference, and sex divisions were reported. Droste and Seyfert 
(24) studied war attitudes in follow-up samples of graduates of a military 
academy, finding no excessive militarism as a residue of such training. 
Comparison of the current senior class in 1940 and a similar group in a 
nonmilitary preparatory school showed mild pacifism for both groups. 
Epstein (34) tested high-school seniors and junior control cases prior to 
a four-day field trip which took the seniors to government-financed 
projects. Retesting showed significant changes to liberalism for the seniors 
and none for the control group of juniors. 

Two studies, by Jersild, Goldman, and Loftus (48), and by Jersild and 
others (49), contrasted behavior patterns of students in “activity” versus 
“old-fashioned” elementary schools in New York City. Both groups 
reported the same worries over school work, failure, or punishment; the 
“activity” school group tends to surpass the control group in academically 
desirable behavior patterns. These findings are generally confirmed by the 
study of the same questions in other New York City schools reported by 
Morrison (61). 

Gilbert (40) reported no effect of high-school science courses in elimi- 
nation of prejudices and interpretation in the face of biased data. In 
contrast, Emme (33) showed clear modification and reduction of super- 
stitions after specific instruction in a college psychology course. Gilbert's 
negative findings are essentially a function of poor experimental work and 
poor measures of bias in information. 

Martin (51) reported median percentiles on the California Test of Per- 
sonality for graduating groups of Los Angeles development schools. The 
mean IQ of the group is 67; all personality test medians are somewhat 
below the median scores for the norm populations. 

Morgan and Ojemann (60) used test-retest methods before and after 
a learning program aimed at understanding of marital, family, and 
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social relations. College and noncollege groups were used as the experi- 
mental groups, and each was matched with a control group. The experi- 
mental group showed reduction in personal conflict scores and improvement 
in relevant attitude scores. This is a clear-cut report of a carefully designed 
experiment in special instruction. 

In the light of the generally insightful level of research represented by 
the foregoing studies and by studies to be mentioned later, the discussional 


article of McCall and Herring (54) regarding the inadequacy of present 
personality tests seems ill-advised and captious. 


Personality Measurement in Clinical Work 


In the study of individual adjustment problems, personality and character 
tests seem to be increasingly used. Clinical work, almost by definition, 
eventuates in research reports less frequently than massed statistical 
analyses of groups; yet even within this restriction there are twelve titles 
sampling this area of investigation over the past three years. 

Crook (20) reported eleven test-retest correlations for college women 
on the Thurstone Personality schedule. The intervals range from five months 
to six and one-half years, and the drop in magnitude of correlation is from 
.78 to .50, with a negatively accelerated end-point. The author also dis- 
cussed these findings in relation to similar data for other types of tests and 
in relation to odd-even reliability versus actual changes in the behavior 
under analysis. Fisher and Hayes (37) studied Bernreuter scores and birth- 
order of twenty-five college women chosen by the resident psychiatrist as 
serious maladjustments out of a total population of 438 students. Low but 
significant biserial coefficients and poorer mean scores support the thesis 
that the maladjustment was predictable at the time of college entrance. 
Cavanaugh (14) related measured personality status to extracurriculum 
participation and interests of college students, finding that the better ad- 
justed individuals tend to have greater participation and interests in par- 
ticipation than the less well adjusted. This result, substantiated increasingly 
in past years, should tend to restate the idea that activities produce better 
adjustment; better adjustment actually produces participation in activities. 

Preston (65) reported a clinical study of 200 children, subdivided into 
four groups by degrees of addiction to overstimulating movies and radio 
programs. A wide range of emotional and health disturbances are found to 
be related to increased addiction; addiction, in turn, was found to be 
related to poorer home standards. 

Aldrich (2) reported the results of individual therapy aimed at diagnosed 
social maladjustments. The naturalness of her setting, the use of control 
groups, and the measured increments in social competence among the ex- 
perimental cases combine to provide an excellent example of research 
methods applied to clinical problems. 

Child and Sheldon (16) found no relation between test variables and 
somatotypes as classified by Sheldon’s methods. They tend to attribute this 
to the superficial nature of the test instruments; the possible superficiality 
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of somatotyping might also be mentioned as a contributing factor. Burn. 
ham (12) reported case studies of three pairs of identical twins, including 
personality measures, which show less similarity than other measures for 
each pair. 

Increasing consonance between measured interests and dominant ex- 
pressed interests, together with clearly defined interest patterns as early as 
Grade X, is found by Carter, Taylor, and Canning (13) who continue their 
significant reports of the California Adolescent Study Project with a sum- 
mary of three testings of the same cases in Grades X, XI, and XII. Burge- 
meister (11) tested and retested college women in their freshman and 
sophomore years with the Strong Interest Test, the Allport-Vernon Study of 
Values, and the Lecky Individuality Record. Permanence of interests seems 
to be related to withdrawn or nonsocial attitudes, to age, and to successfu! 
academic accomplishment. Withdrawn attitudes favor permanence of 
esthetic interests and values most clearly. 

Adkins and Kuder (1) studied the relations between Thurstone’s Primary 
Mental Abilities Test and Kuder’s own preference record, finding relatively 
little relation between the two sets of factors. Darley (21) made an intensive 
analysis of the clinical use of the Strong tests in college counseling work. 
He attempted to cover the major problems of interpretation of interest 
measurement; there is general need for this type of treatment for more 
of the widely used personality and character tests. Traxler (78) has revised 
an earlier monograph on the use of personality tests in which he provides 
mainly an annotated list of test titles. 


Personality Measurement in Theoretical Problems 


Probably the most significant use to be made of personality measurement 
in the past three years is found in studies dealing with the structure, de- 
velopment, and forces of personality. It is no accident that attitude scales 
are frequently used in such studies, since attitudes have come to be viewed 
as outcomes of total personality patterning, mediating between behavior 
and the basic personality patterns. Rather than assuming that the measured 
attitude exists as an entity, the trend involves a breakdown and an analysis 
of the factors producing the attitude; rather than creating a trait hy naming 
the measuring instrument, the trend involves the grouping of tests around 
functional behavioral patterns. The range of ages is from the preschool 
years to adulthood, and the result is a far more integrated picture of per- 
sonality than has been reflected in the past. 

Maurer (53), Read (67), and Richards (69) all reported studies of the 
personality of preschool children. Richards factor-analyzed a matrix re- 
ported by Ball and Roberts for the Merrill-Palmer Personality Rating 
Scales and found the three personality traits of self-sufficiency, conformity. 
and general likeableness. Maurer devised two synonymous adjectival lists 
which were used by fifty raters with a one-week interval. Factor analysis 
yielded extreme conformity, nonconformity, and sociability as the dominant 
trait clusters. These two studies show considerable similarity of results and 
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have significance in indicating the early age of development of differentiable 
personality patterns. Read reduced the California Behavior Inventory from 
231 to 67 items and found it to be a reliable instrument for measuring in- 
dividual children and recording teachers’ awareness of personality differ- 
ences. If this scale shows dominant factors akin to the Maurer and Richards 
studies, considerable evidence for the stability of a few personality traits 
in young children will have accrued. 

Meltzer (55) reported another in his series of studies of nationality and 
race attitudes of American children. One thousand children, 125 of each 
sex in Grades V thru VIII, expressed their feelings of liking or disliking 
twenty-one nationalities on a five-step scale, and supported these ratings 
with reasons for the feelings expressed. For each grade the orders of pref- 
erence and intensity of feelings for the twenty-one nationalities were highly 
correlated, and highly similar to comparable rankings among adults. A 
large amount of stereotypy is also present at each grade level, independent 
of curriculum experiences or age differences. Classification of reasons for 
feelings showed that only the personal concepts of security and freedom, 
and the societal concepts of defensive war and sympathy for the underdog 
showed any grade increments. In the area of attitudes Meltzer has studied, 
the child’s similarity to the adult is so close as to represent a causal factor 
against which enlightened education will have difficult going. 

Sears (71) reported a significant study of the personality correlates of 
levels of aspiration among fourth-, fifth-, and sixth-grade students ob- 
jectively classified as uniformly successful, uniformly failing, and highly 
successful in reading while failing in arithmetic. Characteristic aspiration 
levels were found in these three groups, and rather clear personality differ- 
ences emerge in favor of the successful group with its own reasonable aspira- 
tion level in contrast to the failure group with its unattainable and in- 
flexible aspiration level. This study is excellently designed and reported; 
and its age range makes it additionally significant as a basic study in per- 
sonality structure and development. Hilgard (46) used Sears’ study as a 
point of departure in urging realistic goals, success experiences, and re- 
duced social pressures as essential phases of education if personality 
structure is to be strengthened. 

Edwards (30, 31) reported two excellently designed studies of the effect 
of attitudes on learning and retention. Subjects classified as pro-new-deal, 
anti-new-deal, and neutral responded to a true-false achievement test im- 
mediately after a specific speech concerning the New Deal and communism. 
They were retested twenty-one days later for the study of retention. The 
original attitude or frame of reference clearly determines what is learned 
from the speech and tends to determine what is retained or forgotten. 
Edwards (32) and Stagner and Katzoff (75) reported studies of the 
“Fascist attitude” and its components. Stagner and Katzoff isolated three 
broad determinants of this complex as being “aggressive nationalism” and 
two separable elements related to “middle-class consciousness” and “pro- 
tection of property rights and lack of sympathy for the unfortunate.” Their 
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factor analysis was based on college students and is again disconcerting 
evidence of the rigid patterns with which education must deal. Edwards 
devised a scale in which the presumably subtler principles of fascism are 
evaluated; he referred his findings back to his earlier concept of responses 
congruent with an established frame of reference in the individual. 

A possible clue to the origin of this frame of reference is found in the 
study by Alpert and Sargent (4) who used the immediate emotional 
response of like-dislike to word stimuli, somewhat in the form of a con. 
trolled-association test. Correlations of conservatism scores derived this 
way with conservatism scores in the more traditional attitude scale are high 
enough to indicate a generalized conservatism factor, based on blind emo- 
tionalized response rather than critical evaluative behavior. Hartmann (45) 
documented this interpretation from yet another angle in his analysis of 
reasons for pacifism or nonpacifism as given by mature graduate students 
who wrote essays defending their positions. He attributes the irreconcilable 
and illogical differences to personality factors “which weight some experi- 
ences in a different way than they do others.” 

Graham (43) attempted to demonstrate that generalized attitudes de- 
termine reactions to specific controversial issues, and derives J-shaped 
curves when these generalized attitudes are evoked by a specific issue. He 
discusses personal adjustment in relation to his eight generalized attitudes, 
postulating superior or adequate adjustment as a concomitant of conforming 
general attitudes (42). His data are not presented with full statistical im- 
plementation and his eight generalized attitudes are logically rather than 
experimentally postulated, but the relation of personal adjustment to atti- 
tude position is a vital topic. 

Smith (72) studied fraternity and sorority groups to determine the effect 
of length of group association on attitude homogeneity in six of the 
Thurstone Scales. By the author’s own admission the study is merely ex- 
ploratory and inconclusive because of lack of controls and small samples. 
The problem merits better experimental design, such as Newcomb (63) 
provides as a byproduct of his larger and more significant study at 
Bennington College. He used a single attitude scale to study personality 
correlates in a homogeneous and liberal environment. In addition to the 
general relations between rated prestige, rated community-cooperativeness. 
and liberalism, he analyzes his data to bring out relatively clear qualitative 
personality differences among the accepted and nonaccepted groups in the 
community. He introduces in this analysis the dimension of the individual’: 
awareness of his own attitudinal status, a factor frequently neglected in 
this type of research. 

Fay and Middleton (36) reported an interesting analysis of parental 
influence on attitudes of college students. They break down parental mem- 
bership in widely-known organizations and organization types as a basis 
for studying children’s attitudes, getting results consonant with cultural 
expectations of the attitudes of adult organizations. 
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Doob (23) interviewed all students who were tested and retested on a 
single attitude scale, to determine reasons for changes in those cases where 
a real change was judged to have occurred. The proportion of real changes 
was quite small and tended to involve intervening external events. 


Many of the foregoing studies pave the way, both in methods and hy- 
potheses, for clear, crucial investigations of more segments of personality. 
It is to be hoped that the next three-year period will see an increase in 
the percentage of studies classified here as clinical and theoretical, for it 


is likely that the clearest understanding of the workings of personality will 
emerge in such data. 


General Review of Personality Tests 


For the investigator desiring more extended references to specific tests, 
the following articles may be significant. Hoppock and Shaffer (47) re- 
viewed twenty-two studies of job satisfaction for the year 1940-41. Snyder 
(73) made a general survey of the studies of adolescent personality, classi- 
fied by methods used. Duffy (26) analyzed twenty-one references in which 
the Allport-Vernon Study of Values was used. Super (76) synthesized 


147 articles dealing with that hardy perennial, the Bernreuter Personality 
Inventory. 


No review would be complete without a mention of Allport’s monograph 
on The Use of Personal Documents in Psychological Science (3), which 
makes a justifiably strong case for broadened horizons and methods in 
perscnality research. Those who are concerned with the study of personality 


in clinical and research situations will find this book a valuable source of 
ideas and hypotheses. 
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CHAPTER VII 


Projective Methods in the Study of Personality 


PERCIVAL M. SYMONDS and MORRIS KRUGMAN 


I srerest IN PROJECTIVE TECHNICS for the study of personality has 
grown apace during the three years since the last issue of the REVIEW 
devoted to psychological tests and their uses. Considerable work, in par- 
ticular, has been done in developing the Rorschach method along a 
number of important lines. Extensive work has also been done with the 
Thematic Apperception Test. 

This review articulates with the review by Symonds and Samuel in 
the February 1941 issue. This chapter will discuss the various projective 
methods roughly in order of the importance and number of references 
devoted to them. 


General Papers 


Three papers have appeared in which projective technics have been 
discussed critically and methodologically. Macfarlane (84) has crit- 
icized projective methods from the point of view of their validity. She 
warns of the danger of overgeneralizing from the application of a single 
projective technic to the personality of the subject in general. She also 
suggests that projective methods be subjected to careful validating studies 
in order to test empirically the extent to which their results correspond 
to other descriptions of personality. Rapaport (105), in a survey of 
projective methods, attempted to provide a rational basis for calling 
them “projective.” Rosenzweig (111) has discussed the value of projective 
technics in the study of fantasy. 

Three studies have explored the value of a number of different projec- 
tive technics. One by Lerner and Murphy (76, 96) discusses in some detail 
a number of projective methods used in the study of personality develop- 
ment in young children at Sarah Lawrence College. In the monograph 
there are experiments in free play, using toys, dough, and cold cream as 
plastic materials. L. J. Stone in this same monograph reports experiments 
on the use of balloons as a method of studying aggressive and destructive 
impulses. He also has experimented with group play technics. Lerner de- 
scribed his series of active play technics in which the experimenter par- 
ticipates with the child by setting before the child certain stimuli in 
order to find out how the child will respond. These latter methods help 
to reveal the child’s ego patterns in interpersonal relationships. The 
other general investigation by Wolff (130) represents his experiments in 
the judgment of various forms of human expression such as speech, gait, 
handwriting and judgments of the hands, the profile and left-left and 
right-right photographs of the face which he calls experimental studies 
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in depth psychology. He is interested in the way an individual judges his 
own unrecognized forms of expression. 

Wolff (131) has more recently tried out the value of expressive move- 
ments for understanding the personality of preschool children. In the 
study reported he has employed static positions of the body, dynamic 
body movements, manipulations of plastic material, finger paintings, 
brush paintings, and pencil drawings as projective media. 


Rorschach 


The past three years have witnessed a tremendous growth in the use 
of the Rorschach method. Numerous as the published reports on this 
technic are, they do not begin to indicate the extent of its clinical or 
other uses. Research workers usually publish their results, but clinical 
workers do so only rarely. The two hundred titles published on the 
Rorschach during the period under review, therefore, represent only a 
small part of the interest in the method. 


General 


Until 1942 Beck’s manual (10) was the only publication on the 
Rorschach in book form in English. In 1942, two new manuals (16, 68) 
were published and Rorschach’s original monograph was translated into 
English (110). Lemkau and Kronenberg’s translation of the “Psycho- 
diagnostik” meets a need that has been felt in this country since 1921. 
Altho several private, unauthorized translations have been in existence 
for years, many Rorschach workers have had to study the method from 
second-hand material; now the original source is available to all. Boch- 
ner and Halpern’s manual (16) is an attempt to simplify the method, 
and the result is an oversimplification that may be misleading to the 
beginner. Klopfer and Kelley’s book (68) is a thoro treatment of the 
technic; in addition to a consideration of the development of the method 
and some general methodological problems, adequate treatment is given 
to the technics of administration, scoring, tabulation, and interpretation. 
A considerable portion of the book is devoted to the various clinical en- 
tities as reflected in the Rorschach, and a single case is elaborately pre- 
sented. Klopfer and Kelley included in their presentation many of the 
refinements of the Rorschach technic which the senior author has, for 
almost ten years, been presenting orally and in writing. The bibliography 
of 362 titles was the most complete available to January 1942. 

A recent issue of the Journal of Consulting Psychology was devoted 
entirely to the Rorschach Method and contained eight reports by repre- 
sentative Rorschach workers. In this issue Frank’s introduction (35) 
dealt with the Rorschach as a projective technic; Hertz (54) discussed 
the scope of the method, described attempts at establishing reliability and 
validity, and presented a critical evaluation of the method; Krugman 
(74) discussed the uses of the Rorschach in child guidance and in other 


work with children; Munroe (95) described an experiment in student 
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guidance at Sarah Lawrence College employing abbreviated methods for 
determining potential maladjustment; Piotrowski (102) wrote of the 
Rorschach in vocational selection, considering both the general applica- 
tion of the method for this purpose, by means of matching personality 
factors against vocational requirements, and the specific application by 
means of Rorschach “signs” that are found to possess predictive value 
for specific occupations; Beck (11) considered the use of the Rorschach 
in psychopathology, presenting in considerable detail Rorschach findings 
with schizophrenics; Klopfer (67) described training opportunities and 
training methods in the Rorschach; and Harrower-Erickson (48) presented 
the group Rorschach, and compared it with the individual application. 
Thruout the articles ran two parallel threads: the demonstrated uses 
of the Rorschach in the various phases under discussion, and the limitations 
of the Rorschach method in these fields. This issue of the Journal of Con- 
sulting Psychology should prove useful to psychologists for background 
material on the Rorschach, but cannot serve as a manual for learning 
the method. 

The Rorschach Research Exchange is a mimeographed quarterly, first 
published in 1936 by the Rorschach Institute and now completing its 
seventh volume. It serves as a clearing-house for new developments in 
the Rorschach, and, in the past three years, contained 40 percent of the 
Rorschach articles appearing in American psychological and psychiatric 
journals. Selected references from this periodical will be found under 
appropriate headings. 


Historical 


Altho Klopfer and Kelley (68) gave considerable space to the his- 
torical treatment of the Rorschach, Hertz (55) and Krugman (75) dealt 
with the development of the method in more detail and included dis- 
cussions of areas in which the Rorschach has been found useful, limita- 
tions of the method, and problems not yet solved. 


Methodological 


There has been less controversy about scoring Rorschach categories 
during the past few years than was the case in previous years. Many 
of the differences in scoring are unimportant. Klopfer (68), for example, 
does not use F+ percent, but is careful about F—; the net result in inter- 
pretation is identical, since relative number of poorly perceived forms 
is of diagnostic significance, and the F+ percent is the complement of the 
F— responses. In other cases, the difference lies mainly in the symbol 
used rather than in the concept, as in scoring of shading responses. The 
differences in scoring that do exist do not alter the personality interpreta- 
tion materially. Hertz (56, 57), who has given considerable attention 
to methodological problems, reviewed the results of her work on scoring 
in the Rorschach Research Exchange and summarized prevailing opinion 
about the shading response. 
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In the main, Rorschach workers have not been too friendly to an ex- 
clusively statistical approach. The importance of norms, of objective 
scoring when possible, of partial standardization, and of validation are 
readily conceded by all, but many fear that a complete psychometric 
approach would decrease the value of the method as a dynamic instrument. 
Zubin (134, 135) has made several attempts to promote the psychometric 
approach to the Rorschach, but, to date, has not had many followers. 

One of the difficulties with mental hospital patients is their refusal, 
or inability to respond to all the cards. This, of course, is diagnostically 
important, altho additional responses would be useful. Kelley and others 
(66) have employed intravenous sodium amytal in prenarcotic doses, 
and have been successful in obtaining a more complete Rorschach picture. 


Norms 


Considerable research is in progress to determine the Rorschach pat. 
terns that may be expected of various age groups and of each sex at these 
age levels. Klopfer and Margulies (69), studying the Rorschach records 
of 155 children two to seven years of age (205 records), found that the 
use of W, as the only response, decreases with age, while the use of M., 
FM, and CF increases with age. Paulsen (99) found similar results 
with a grqup of eighty-two six- and seven-year-olds. The unexpected use of 
W among young children, a component usually associated with maturity 
and intelligence, is accounted for by the fact that it is an “undifferentiated” 
W, that is, a vague, formless response rather than the organized response 
of the adult. Stavrianos (121) studied 131 five- to eleven-year-old 
children and found that all of them likewise overemphasized W, that 
boys matured steadily with age, while girls passed thru an impulsive 
period between the years of seven and nine, in addition to the usual finding 
that girls mature at an earlier age than boys. Hertz (53), in four suc- 
cessive reports, has presented the most detailed and elaborate discussion 
of adolescents and the Rorschach, particularly with respect to M, C. 
and experience balance. In each of these categories, the seventy-six sub- 
jects, forty-one boys and thirty-five girls, were studied in the most minute 
detail, and sex differences were noted. The subjects were compared at 
age twelve and age fifteen. This ambitious study is apparently still in 
progress and probably will continue to be reported in sections. 


Validity and Reliability 


The various methods usually employed to establish validity were 
reviewed by Hertz (58), who made a plea for the combination of qualita- 
tive and quantitative approaches, emphasizing, however, that the method 
must be validated as a whole by the utilization of experimental and clinical 
technics and by taking into account dynamic relationships of the total 
personality. J. I. Krugman (72), utilizing Vernon’s matching technic 
with twenty-five child guidance clinic cases, had seven judges match in- 
dependent Rorschach interpretations with each other, Rorschach protocols 
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with interpretations, and, finally, Rorschach interpretations with clinical 
case study abstracts. The Rorschach was found to have a high degree of 
reliability and clinical validity. Fosberg (34), after administering the 
Rorschach to sixty-six subjects, asked fifty of them to falsify their re- 
sponses on a second administration, and the other sixteen were required 
to look for determinants pointed out by the examiner. In neither case 
was the second set of responses materially different from the first. Fosberg, 
therefore, concluded that the test-retest reliability of the Rorschach is high. 


Psychopathology 


The Rorschach has been employed in practically every phase of 
psychopathology, including the neuroses, psychoses, epilepsy, psychopathic 
personality, organic brain conditions, mental deficiency, and a variety of 
other psychopathies. One of the current trends in this field is the de- 
velopment of differential “signs” for diagnosis and for the prediction 
of the outcomes of therapy. Harrower-Erickson (46, 50) and Miale and 
Harrower-Erickson (89) have developed nine signs that are helpful 
in diagnosing neurosis when used in conjunction with the qualitative 
interpretation of the Rorschach. These are R, FC, M, FM to M, F percent, 
and A percent, color shock, shading shock, and refusals. Five or more 
of these signs were found in 80 percent of 74 diagnosed neurotics but 
in 15 percent of 385 control subjects. Ross (112) studied 236 subjects 
for the presence of Piotrowski’s organic signs and the Miale, Harrower- 
Erickson neurotic signs, and concluded the organic signs are not neces- 
sarily indicative of cerebral lesion, but of a disfunction of the nervous 
system that may be either organic or functional, while the neurotic signs 
are indicative either of psychoneurosis or a basic personality insecurity. 
Piotrowski (100) has used the Rorschach to determine the probable out- 
come of insulin therapy in schizophrenics and found that the quality 
of the C and M responses is closely related to improvement after therapy. 
Halpern (43) came to the same conclusion, but added productivity, 
chiarascuro responses, and human responses as significant. Harrower- 
Erickson (49) found that patients with cerebral tumors showed marked 
constriction on the Rorschach as compared with “normal” individuals. 
Arluck (4) studied twenty idiopathic epileptics, using three different 
equated control groups, and found that the epileptics presented a more un- 
favorable picture on the Rorschach with respect to W. F. percent, pro- 
portion of FC to CF+ C, time per response, and constriction. Brussel, 
Grassi, and Melnicker (18) reported that in the case of sixteen patients 
with postconcussion syndromes and various clinical diagnoses, the verbal 
Rorschach and the graphic Rorschach showed close agreement with diag- 
nosis resulting from neuropsychiatric examinations. 


Cultural Differences 


Because of ease of administration and other advantages, the Rorschach 
is now widely used in anthropological and cultural studies. Hallowell 
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(40, 41, 42) believes that this technic has considerable value in the 
study of cultural variables and comparative social psychology. In a study of 
two groups of American Indians, one with greater, and the other with 
lesser acculturation, Hallowell found personality differences on th 
Rorschach that corresponded to the known cultural differences. Schachtel. 
J. Henry, and Z. Henry (117) found similar correspondence between 
blind Rorschach analyses and ethnological facts in the case of Pilaga 
Indian children. These, and other studies, emphasize the need for utilizing 
standards of interpreting the Rorschach that take into account radical! 
cultural differences. Cook (25), in a study of fifty Samoan high-school 
boys, found marked differences between them and European and American 
boys, and concluded that the Rorschach cannot be interpreted for Samoans 
in terms of the criteria established for other cultural groups, and DuBois 
and Oberholzer (29) came to a similar conclusion after a study of Alorese. 
Dutch East Indians. 


Group Method 


One of the recent advances in the Rorschach has been the developmen 
of the group method by Harrower-Erickson (47, 48, 51). The group 
Rorschach bears the same relationship to the individual examination as 
the group intelligence test bears to the individual Binet examination. 
One major difference is that it cannot be applied to young children. In 
the group method the ten plates are projected on a screen under standard 
conditions and the responses are written out by the subjects, various 
methods being employed to obtain’ information usually obtained in the 
individual inquiry. In her various studies Harrower-Erickson found the 
group Rorschach valuable as a screening device, and the results of the 
group method have, in the main, corresponded closely with those of the 
individual application. Several investigators, notably Hertz (52) and 
Lindner and Chapman (80), have modified Harrower-Erickson’s group 
method in some details, believing their modifications are important, but 
these “improvements” do not seem very important. Harrower-Erickson 
and Steiner (51) and Hertzman (59) have conducted comparative studies 
employing the individual and group methods with the same subjects. 
and concluded that the two methods are sufficiently similar to justify 
the use of the group method as a rapid screening procedure. 


Other Modifications 


Two other noteworthy modifications of the Rorschach have been made 
in recent years. Munroe (93, 94) developed the inspection technic, )) 
which Rorschach protocols are scanned rapidly for specific signs of 
maladjustment. Munroe validated her technic by comparison of these 
Rorschach ratings with other technics, including psychiatric interviews 
and questionnaires, and found a high degree of agreement. The other 
recent development is the graphic Rorschach method. This is a supple- 
ment to the individual Rorschach in which the subject sketches the im- 
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pression he has just described and indicates the relationship between 
his drawing and the Rorschach plate. The method is described by Rochlin 
and Levine (106). Levine and Grassi (77), studying the drawings of 150 
subjects, found that they fell into a continuum that ranged from the “blot 
dominated” response at one extreme to the “concept dominated” at the 
other. Most serious pathological cases fell in these extremes, with patients 
showing organic brain pathology and deteriorated schizophrenics in the 
former group, and an undifferentiated number of pathological types in 
the latter. Grassi (37) found the graphic Rorschach useful for establishing 
the presence of hallucinations in schizophrenics, and for determining 
prognosis in that group. 


Miscellaneous Applications 


The Rorschach has been put to a great variety of uses other than 
those already mentioned. Krugman (72, 73) described its uses in the 
child guidance clinic and in work with children in general. Munroe (95) 
found it useful in student guidance at college. Krafft and Vorhaus (71) 
described its application to family case work. Davidson (27), studying 
142 gifted children with a mean IQ of 143, found these children well 
adjusted emotionally, and family income no significant source of variation 
in influencing the personality pattern of the child or in determining 
the degree of adjustment. Davidson transferred the use of Rorschach 
signs from the pathological to the normal, and hypothesized seventeen 
signs of adjustment. Margulies (85) likewise utilized signs of adjust- 
ment in studying successful and unsuccessful students in Grades VIII 
and IX, and found that unsuccessful students showed significantly more 
color shock, shading shock, and animal responses. Piotrowski (101, 102) 
developed eight Rorschach components corresponding to the same number 
of desirable traits in mechanical work and suggested that this approach, 
which is actually the “sign” approach, can be developed for educational 
and vocational guidance in other areas. Goldfarb (36), comparing case 
studies with Rorschach interpretations of eight enuretic children between 
seven and ten years of age, found six of them aggressive, two fearful 
and withdrawing, and all of them emotionally immature. Endacott (31) 
studied one hundred male juvenile delinquents with the Rorschach and 
found them inhibited emotionally, and pedantic and well controlled in- 
tellectually. Endacott believes that this rigid personality pattern is the 
result of strong pressures and frustrations. Alcoholism has been studied 
by several Rorschach investigators. Kelley and Barrera (65) admin- 
istered Rorschach examinations to ten normal subjects before ingestion 
of alcohol and again forty minutes later. Some shifting in personality re- 
sulted and this led to the conclusion that the Rorschach method is valid 
for indicating early clinical changes in personality. Jastak (61) studied 
ten alcoholics and found that all had abnormal Rorschach records, but 
drew no conclusions as to whether the abnormalities were inherent in 
the patients or whether they resulted from alcoholism. Jastak suggested 
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that both factors were interdependent. Billig and Sullivan (15) studied 
forty patients hospitalized for chronic alcoholism and found that, as a 
group, the personality picture was one of scif-centered wish-fulfillment, 
weak emotional control, anxiety and concern about the body, and high 
ambition with limited actual achievement. 

The uses of the Rorschach described in this review are by no means 
complete. Enough have been given, however, to indicate the extent to 
which the Rorschach method has established itself as part of the 
clinical and research armamentarium. The military uses of the Rorschach 
have not been discussed since Chapter IX of this issue deals with 
psychological tests in the armed services. 


Thematic Apperception Tests 


Next to the Rorschach, the Thematic Apperception Test has claimed 
most attention. Indeed, in numerous testing programs the Thematic Ap- 
perception Test and the Rorschach are listed as the two preferred methods 
for the study of personality. Actually, much yet remains to be done in 
exploration of the use of the Thematic Apperception method before it 
can receive general clinical use. Probably, in most cases, the Thematic 
Apperception Test is being now used in exploratory fashion and inter. 
preted with whatever insight the counselor can bring to it rather 
than by following the method of analysis proposed by Murray (97) in 
terms of need and press. 

Several papers have discussed the clinical use of the Thematic Ap- 
perception Test. Balken and Masserman (5) reported the characteristic 
stories given by three types of psychiatric patients according to their 
formal aspects. Masserman and Balken (86) and Balken and Vander 
Veer (6) suggested the analysis of thematic apperception stories from 
a psychoanalytical point of view. Three papers dealing with the use 
of the Thematic Apperception Test with mentally disordered persons 
appeared in a symposium in a 1940 issue of Character and Personality. 
Harrison (44, 45), interpreting the results of the test blind, found that 
there is considerable correspondence with biographical and personality 
data. Rotter (113) analyzed the test from the point of view of certain 
structural features. Christenson (22), using a simplified method of ad- 
ministering the Thematic Apperception Test, believes that the method 
reveals significant affective content and aids in distinguishing between 
the major reaction types. Rapaport (104) analyzed the formal aspects 
of the responses and gave clinical illustrations. Sarason (116), using 
the test with mentally deficient girls, found that aggression, desire for 
affection, rebellion against parents, guilt feelings, and feelings of lone- 
liness are the most frequent themes. 

A number of experimental investigations of the Thematic Apperception 
method have been carried out at the Harvard Psychological Clinic 
under the direction of Murray. These were to have been reported at the 
1942 meeting of the Americar. 2sychological Association which unfor- 
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tunately was never held. These papers are available only in abstract 
but a complete report of the experimental studies at the Harvard 
Psychological Clinic has been prepared by Murray and will be published. 
A brief reference to the four abstracts which have already appeared 
will indicate the nature of this experimental work. Rodnick (107) found 
that students respond on the Thematic Apperception Test somewhat dif- 
ferently after being placed in a frustrating situation and that well-adjusted 
students respond in a different manner to frustration than the poorly 
adjusted. Bellak (12) reported that a student whose stories are criticized 
tends to include more aggressive elements in succeeding stories. Tomkins 
(125) studied the effect of repeating the Thematic Apperception Test and 
found that altho the stories to a given picture differ on successive trials 
there is a continuity in the main themes produced. Wyatt (133) believes 
as a result of investigation of the formal factors in the Thematic Appercep- 
tion Test that they have little diagnostic value. 

The comprehensive study under the direction of Sandford (7, 115) 
of forty-eight normal children in a private school in Massachusetts is of 
particular importance. He analyzed the results of the Thematic Appercep- 
tion Test with great care and presented the intercorrelation of all of the 
variables as well as correlation of these variables with a number of other 
physical, mental, and emotional factors which were included in the 
study. Sandford attempted to isolate the clusters of factors growing 
out of the Thematic Apperception method and used these clusters as an 
attempt to add to our understanding of the structure of personality. 

In addition to these investigations using Murray’s Thematic Appercep- 
tion Test, there have been investigations of the story method. One signifi- 
cant study by Proshansky (103) attempted to use the picture-story method 
for the study of attitudes. He was interested in studying the attitudes 
of college students toward labor and found considerable agreement be- 
tween his analysis of stories written from pictures and Newcomb’s Attitude 
Scale. This is an auspicious beginning of the use of projective methods 
in the study of specific attitudes. McCowan (87) and Grotjahn (38) gave 
two reports of the analyses of a single story written by individuals and 
a comparison of the analyses of these stories with facts known about 
these individuals. Studies are needed to provide information as to how 
younger children respond to pictures. Amen (1) presented a series of 
fifteen pictures to young children and reported on the changes in the inter- 
pretation with increasing age. Wright (132) used a method which will 
undoubtedly see considerable further development, in which he described 
a situation to a child and then had the child respond to it by telling a 
story. In this particular experiment the situations were those involving 
conflict, and the child’s type of reaction to the conflict was revealed thru the 
story which was told. 

A study by Sandford (114), altho ostensibly dealing with the relation 
of speech and personality, belongs in this section dealing with the story 
method inasmuch as it deals with the personality indications of a detailed 
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analysis of speech. A total of 234 mechanical, grammatical, psychogram- 
matical, and lexical categories were analyzed, and compared with the per- 
sonality characteristics of individuals. The results are illustrated for two 
individuals. This study is of basic importance for anyone who plans to 
relate language qualities with personality characteristics. 


Play Technics 


Free play with toy or play materials has continued to impress psy- 
chologists with its diagnostic possibilities. Tallman and Goldensohn’s 
paper (124) is probably the most comprehensive review and analysis of 
play technic that has appeared during the three-year period. A number 
of moot points of technic are discussed. Altho these authors are thinking 
primarily in terms of play therapy, they are also interested in the diag- 
nostic values of play. Another important contribution to the growing 
literature on play diagnosis is the monograph by Erikson (32), in 
which he presents his first comprehensive discussion of his method of 
using play for studying children’s problems clinically. 

Baruch (8, 9) has shown how play can be used in a nursery school 
for the study of a child’s developing personality. She discusses how. 
thru play, a nursery-school teacher can grow in her understanding of 
children. Symonds (123) has described how play might be used as a 
test of the child’s readiness for school and suggests that by observing 
a child at play, one can determine the child’s attention, his ability 
to carry play thru to a conclusion, his constructiveness, his freedom 
from inhibitions, and his relationships with the examiner. Levy (78). 
whose monograph on Studies in Sibling Rivalry has been a classic, has 
described how a child’s hostility patterns could be revealed thru the use 
of sibling rivalry play, dreams, drawings, Rorschach records, and the 
clinical history, and how these various approaches supplement and cor- 
roborate each other. 

A number of papers discuss play diagnosis in a general way. Amster 
(2) discusses diagnosis as one of six uses of play in therapeutic treat- 
ment of young children. Weiss-Frankl (128), using a modification of the 
play interview in the study of children’s personalities, found that the 
child’s treatment of the materials tended to follow a developmental se- 
quence of exploration, experimentation, and finally projection. Despert 
(28), studying children’s verbal, motor, and affective expressions in doll 
play, found that they dramatize and express their affective relations with 
their family. Watson (127) in a popular article illustrates the diagnostic 
and therapeutic uses of play with clinical material. An article by Kanner 
(64) tends to be somewhat critical of play diagnosis, poking subtle fun at 
the interpretations of play made by certain workers. Roheim (108, 109) 
analyzed the results of play with primitive children and found that their 
play has some of the same characteristics found in the play of the children 
of our own culture. A study by Bowley (17) is important for its discus- 
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sion of methods of recording the results of observations of methods of 
play. ; 

A number of papers have dealt more specifically with play therapy 
than the diagnostic values of play, but because the two are closely related 
it seems worthwhile to refer to these discussions briefly even tho they 
depart somewhat from the topic of this chapter. There are six general 
articles discussing play therapy, all of them helpful. Newell (98) divided 
play therapy into two general types, free and controlled, and discussed 
their general advantages and disadvantages as well as the problems for 
which each is best suited. Bender and Woltmann (14) gave a brief but com- 
prehensive survey of the theories of the leading authorities on play 
therapy and a discussion of the most commonly used materials. A sum- 
mary of the Freudian viewpoint and the analysis of a case is given 
by Knoepfmacher (70) who believes the chief value of play is cathartic. 
Cameron (21) reviewed the theories of play therapy of Solomon, Levy, 
and Gitelson. It is his belief that unless the therapist is analytically 
trained, play therapy should be conducted largely on the basis of re- 
lationship rather than interpretation. Whiles (129) discussed the types 
of equipment to be used in play therapy and pointed out certain pit- 
falls against which the therapist should be on guard. Biihler (19), in a 
popular article for parents and others who are not scientifically trained, 
discussed the aims and methods of play therapy. 

Conn (23, 24) discussed the use of play therapy in the treatment of 
fearful children. Solomon (120) has contributed a supplementary article 
to his earlier one on active play therapy in which he advocates the 
addition to the usual family group of a doll to represent the therapist. 
Using ordinary office equipment, principally the dictaphone, Durfee (30) 
reported considerable therapeutic success with boys ten years of age and 
over. Lyle and Holly (83) discussed the making of puppets which they 
feel provides release to a creative urge and offers a sense of achievement. 
Jenkins and Beckh (62) presented simple and direct suggestions for 
the making and use of finger puppets and masks as a method of in- 
dividual therapy. Jacoby (60) tells of the experience of one child in 
nursery school from the age of four to six and indicates how the 
nursery-school experience was therapeutic. 


Handwriting 


Work in this area has not been always characterized by the highest 
scientific standards. However, a study by Lewinson and Zubin (79) is 
outstanding as a scientific contribution in the field of graphology. Fol- 
lowing Klages’ theory and based on an elaborate analysis of handwriting 
into its elements, these authors have drawn up a number of rating scales for 
the quantification of handwriting analysis. Anyone wishing to do re- 
search on the use of handwriting as a projective technic must acquaint 


himself with this study. 
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Three studies, by Crider (26), Middletown (90), and Super (122), 
put graphology to the experimental test and found that the results do not 
correlate substantially with results from other psychological tests. How- 
ever, these results may not be entirely fair to the claims of graphologists 
inasmuch as the psychological tests by which they were validated are 
not in themselves too trustworthy. There is still a need for experimental 
validation of graphological methods to be conducted by someone who is 
sympathetic to the aims and methods of graphologists but who is at the 
same time acquainted with the methods of psychological inquiry. 

Long and Tiffin (81) found that graphology still has a large following 


among business executives. 


Drawing and Painting 


Interest in these media has been persistent but not widespread during 
the three-year period. Bender and Wolfson (13) have continued their 
analyses of the drawings of children in the psychiatric ward at Bellevue 
Hospital. In the present article they discuss the nautical themes of boats 
and water appearing in children’s drawings which they believe throw 
light upon problems centering around the child’s early development. 
Schilder and Levine (118) analyzed the abstract drawings of patients and 
found that even in their abstractions individuals are revealing some of 
their important drives. A study of the formal aspects of the drawings of 
mental defectives led Lowenfeld (82) to the conclusion that the creative 
work of mental defectives indicates their isolation and lack of feeling 
for body form. Schmidl-Waehner (119) proposed supplementing the 
analysis of drawings thru the content with consideration of the formal 
features in drawing which have a certain relation to the Rorschach method. 
Anastasi and Foley (3), in a careful study of the drawings of 680 sub- 
jects illustrating four themes, found that they do not differentiate be- 
tween normal and psychotic subjects altho there were some significant dif- 
ferences in a number of specific categories. Drawings of a single child 
are interpreted psychoanalytically by McIntosh and Pickford (88). 


Voice 


Moses (91, 92) analyzed different features and characteristics of the 
voice as related to personality traits thereby indicating how voice might 
be used as a projective technic. In another paper, Jones (63) reported a 
study in which Moses was able to show remarkable agreement in the 
blind interpretation of a voice record and the results of a blind Rorschach 
interpretation. j 


Miscellaneous 

For continued growth in the use of projective technic, it is a healthy 
sign that various individuals are experimenting with new projective media. 
Haggard (39) described a method of having children create stories 
for their favorite comic strip characters, thus projectively describing 
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their own personal difficulties. A report by Tuddenham (126) describes 
how the reputation or “guess who” test can be used as a projective technic 
by studying the characteristics which a child attributes to others and 
to himself. In this case there is validating information contained in the 
evaluation of the subject by other children and by the teacher. Tudden- 


ham discusses the possible values of the child’s own ratings as revealing 
his tendencies. 


Four motion pictures dealing with (a) balloons, (b) frustration play 
technics, (c) finger painting, and (d) “This is Robert,” a study of per- 
sonal growth in a preschool child, have been prepared by Fisher, Stone, 
and Bucher (33). These should have value for those who wish to see 
exactly how projective technics are administered to younger children. 
Lowenfeld’s World Test has been prepared for commercial distribution 
thru the Psychological Corporation by Biihler and Kelly (20). 


Bibliography 


1. Amen, Evisasetu W. “Individual Differences in Apperceptive Reaction: A Study 
of the Responses of Pre-School Children to Pictures.” Genetic Psychology 
Monographs 23: 319-85; May 1941. 

~ 2. Amster, FANNIE. “Differential Uses of Play in Treatment of Young Children.” 
American Journal of Orthopsychiatry 13: 62-68; January 1943. 

3. Anastast, Anne, and Fotey, Joun P., Jr. “An Experimental Study of Drawing 
Behavior of Adult Psychotics in Comparison with that of a Normal Control 
Group.” Psychological Bulletin 39: 462-63; July 1942. 

4. Anuuckx, Epwarp A. “A Study of Some Personality Differences between Epileptics 
and Normals.” Rorschach Research Exchange 41: 154-56; October 1940. 

5. Batxen, Eva R., and Masserman, Jutes H. “The Language of Phantasy: III. 
The Language of the Phantasies of Patients with Conversion Hysteria, Anxiety 
fo and Obsessive-Compulsive Neuroses.” Journal of Psychology 10: 75-85; 
uly 1940. 

6. Batxen, Eva R., and Vanper Veer, Aprian H. “The Clinical Application of a 
Test of Imagination to Neurotic Children.” American foanal of Orthopsy- 
chiatry 12: 68-80; January 1942. 

7. Barker, Rocer G., and otners. Child Behavior~and Development. New York: 
McGraw-Hill Book Co., 1943. Chapter 32, (by R. Nevitt Sandford). “Per- 
sonality Patterns in School Children.” 

8. Barucn, Dorotny W. “Aggression During Doll Play in a Pre-School.” American 
Journal of Orthopsychiatry 11: 252-59; April 1941. 

9. Barucu, Dorotrny W. “Doll Play in Pre-School as an Aid in Understanding the 
Child.” Mental Hygiene 24: 566-77; October 1940. 

10. Becx, Samuet J. Introduction to the Rorschach Method: A Manual of Personality 
Study. American Orthopsychiatric Association Monograph No. 1. New York: 
American Orthopsychiatric Association, 1937. 278 p. 

ll. Becx, Samuet J. “The Rorschach Test in Psychopathology.” Journal of Consulting 
Psychology 7: 103-11; March-April 1943. 

12. Bettax, Leopoitp. “An Experimental Investigation of Projection.” Psychological 
Bulletin 39: 489-90; July 1942. 

13. Benper, Lauretta, and Wotrson, WituiaM Q. “The Nautical Theme in the Art 
~ Fantasy of Children.” American Journal of Orthopsychiatry 13: 462-67; 

y 1943. 

14. Benner, Lauretra, and Wortmann, Apotr G. “Play and Psychotherapy.” 
Nervous Child 1: 17-42; a 1941. 

15. Bric, Orro, and Suuuvan, D . J. “Prognostic Data in Chronic Alcoholism.” 
Rorschach Research Exchange 6: 117-25; July 1942. 

16. Bocuner, Rutru, and Ha.pern, Fiorence. The Clinical Application of the 
Rorschach Test. New York: Grune and Stratton, 1942. 216 p. 














Review oF EpucaTIoNAL RESEARCH 











Al. 


18. 


19. 
20. 
21. 


22. 
23. 
24. 
25. 
26. 
27. 
728. 


29. 


30. 
31. 


- 32. 


37. 


39. 


94 





Vol. XIV, No. 1 


Bow ey, Acatua H. “A Study of the Factors Influencing the General Develop. 
ment of the Child During the Pre-School Years by Means of Record Forms.” 
British Journal of Psychology Monograph Supplements, No. 25, 1942. 

Brusset, James A.; Grassi, JosepH R.; and MELNICKEeR, AARON A. “Th, 
Rorschach Method and Postconcussion Syndrome.” Psychiatric Quarterly 16: 
707-43; October 1942. ; 

Biuter, Cuaritotre. “Guidance Contributes Play Therapy.” Child Study 18: 
115-16; Summer 1941. 

Biuter, Cuarvotre, and Ketty, G. The World Test. New York: Psychological 
Corporation, 1941. 

Cameron, WituiAM M. “The Treatment of Children in Psychiatric Clinics with 
Particular Reference to the Use of Play Technics.” Bulletin of the Menninger 
Clinic 4: 172-80; November 1940. 

CHRISTENSON, JAMEs A., Jr. “Clinical Application of the Thematic Apperception 
Test.” Journal of Abnormal and Social Psychology 38: 104-106; January 1943. 
Conn, Jacos H. “The Timid Dependent Child.” Journal of Pediatrics 19: 91-102: 

July 1941. 

Conn, Jacosp H. “The Treatment of Fearful Children.” American Journal 0/ 
Orthopsychiatry 11: 744-51; October 1941. 

Cook, Pur H. “The Application of the Rorschach Test to a Samoan Group.” 
Rorschach Research Exchange 6: 51-60; April 1942. 

Crwer, Buake. “The Reliability and Validity of Two Graphologists.” Journal of 
Applied Psychology 25: 323-25; June 1941. 

Davinson, Heten H. Personality and Economic Background: A Study of High!) 
Intelligent Children. New York: King’s Crown Press, 1943. 189 p. 

Desprert, J. Loutse. “A Method for the Study of Personality Reactions in Pre- 
School Age Children by Means of Their Play.” Journal of Psychology 9: 17-29: 
January 1940. 

DuBots, Cora, and OserHouzer, Emit. “Rorschach Tests and Native Personality 
in Alor, Dutch East Indies.” Transactions of the New York Academy of Sciences 
4: 168-70; March 1942. 

Durree, M. B. “Use of Ordinary Office Equipment in Play Therapy.” American 
Journal of Orthopsychiatry 12: 495-502; July 1942. 

Enpacotr, Joun L. “The Results of 100 Male Juvenile Delinquents on the 
Rorschach Ink Blot Test.” Journal of Criminal Psychopathology 3: 41-50; Jui 
1941. 

Erixson, H. “Studies in the Interpretation of Play; Clinical Observation of 
Play Disruption in Young Children.” Genetic Psychology Monograph 22: 
557-671; November 1940. 


. Fisner, Mary S.; Stone, L. JosepH; and Bucuer, Joun. “Balloons: Demon- 


stration of a Projective Technique for the Study of Aggression and Destruc- 
tion in Young Children.” Film Library. New York: New York University, 1941. 
“Finger Painting: Children’s Use of Plastic Materials.” Film Library. New 
York: New York University, 1941. “Frustration Play Techniques: I-Blocking 
Games; II-Frustration and Hostility Game.” Film Library. New York: New 
York University, 1942. “This Is Robert: A Study of Personality Growth in a 
Pre-School Child.” Film Library. New York: New York University, 1943. 


. Fosperc, Irnvinc A. “An Experimental Study of the Reliability of the Rorschach 


Psychodiagnostic Technique.” Rorschach Research Exchange 5: 72-84: April 
1941. 


. Frank, Lawrence K. “The Rorschach Method: Forward.” Journal of Consulting 


Psychology 7: 63-66; March-April 1943. 


. Gotprars, WiuiaAM. “Personality Trends in a Group’ of Eneuretic Children 


below the Age of Ten.” Rorschach Research Exchange 6: 28-38; January 1942. 
Grasst, Josep R. “Contrasting Schizophrenic Patterns in the Graphic Rorschach.” 
Psychiatric Quarterly 16: 646-59; October 1942. 


. Groryann, Martin. “A Child Talks about Pictures; Observations about thie 


Integration of Fantasy with the Process of Thinking.” Psychoanalytic Quarter!) 
10: 385-94; July 1941. 

Haccarp, Ernest A. “A Projective Technique Using Comic Strip Characters.” 
Character and Personality 10: 289-95; June 1942. 





ss 








February 1944 


40. 


41. 


47. 


57. 


. Hattowet, A. Irvine. “7 





METHODS IN THE STUDY OF PERSONALITY 





Hatiowe t, A. Irvine. “Acculturation Processes and Personality Changes as In- 
dicated by the Rorschach Technique.” Rorschach Research Exchange 6: 42-50; 
April 1942. 

Hatitowe t, A. Irvinc. “The Rorschach Method as an Aid in the Study of Per- 
sonalities in Primitive Societies.” Character and Personality 9: 235-45; March 
1941. 

‘he Rorschach as a Tool for Investigating Cultural 

Variables and Individual Differences in the Study of Personality in Primitive 

Societies.” Rorschach Research Exchange 5: 31-34; January 1941. 


. Havpern, Fiorence. “Rorschach Interpretation of the Personality Structure of 


Schizophrenics Who Benefit from Insulin Therapy.” Psychiatric Quarterly 14: 
826-33; October 1940. 


. Harrison, Ross. “Studies in the Use and Validity of the Thematic Apperception 


Test with Mentally Disordered Patients. I]-A Quantitative Validity Study.” 
Character and Personality 9: 122-33; December 1940. 


. Harrison, Ross. “Studies in the Use and Validity of the Thematic Apperception 


Test with Mentally Disordered Patients. III-Validation by the Method of Blind 
Analysis.” Character and Personality 9: 134-38; December 1940. 


. Harrower-Ericxson, Moxie R. “Diagnosis of Psychogenic Factors in Disease 


by Means of the Rorschach Method.” Psychiatric Quarterly 17: 57-66; January 
1943 


HaRROWER-ERICKSON, Mo ute R. “Directions for Administration of the Rorschach 
Group Test.” Journal of Genetic Psychology 62: 105-17; March 1943. 


. Harrower-Ericxson, Moe R. “Large Scale Investigation with the Rorschach 


49. 


Method.” Journal of Consulting Psychology 7: 120-26; March-April 1943. 
Harrower-Exicxson, Mou R. “Personality Changes Accompanying Cerebral 

Lesions: I. Rorschach Studies of Patients with Cerebral Tumors.” Archives of 

Neurology and Psychiatry 43: 859-90; May 1940. 


. Harrower-Ericxson, Mori R. “The Value and Limitations of the So-Called 


‘Neurotic Signs.’” Rorschach Research Exchange 6: 109-14; July 1942. 


. Harrower-Ericxson, Mouute R., and Sterner, Martina E. “Modification of the 


Rorschach Method for Use as a Group Test.” Journal of Genetic Psychology 
62: 119-33; March 1943. 


. Hertz, Marcuertte R. “Modification of the Rorschach Ink Blot Test for Large 


Seale Application.” American Journal of Orthopsychiatry 13: 191-212; April 
1943. 


. Hertz, Marcuertte R. “Personality Patterns in Adolescence as Portrayed by the 


Rorschach Ink-Blot Method: I-The Movement Factors.” Journal of General 
Psychology 27: 119-88; July 1942. (With Baker, Elizabeth) II-“The Color 
Factors.” 28: 3-61; January 1943. III-“The ‘Erlebnistypus.’” 28: 225-276; 


April 1943. IV-“The ‘Erlebnistypus’ (A Typological Study).” 29: 3-45; July 
1943. 


. Hertz, Marcuertte R. “The Rorschach Method: Science or Mystery.” Journal 


of Consulting Psychology 7: 67-79; March-April 1943. 


. Hertz, Marcuertte R. “Rorschach: Twenty Years After.” Rorschach Research 


Exchange 5: 90-129; July 1941. (Also in Psychological Bulletin 39: 529-72; 
October 1942.) 


. Hertz, MarcuerireE R. “The Scoring of the Rorschach Ink-Blot Method as 


Developed by the Brush Foundation.” Rorschach Research Exchange 6: 16-27; 
January 1942. 
Hertz, Marcuertte R. “The Shading Response in the Rorschach Ink-Blot Test: 


A Review of Its Scoring and Interpretation.” Journal of General Psychology 23: 
123-67; July 1940. 


. Hertz, Marcuertre R. “Validity of the Rorschach Method.” American Journal 
59. 


of Orthopsychiatry 11: 512-20; July 1941. 
Hertzman, Max. “A Comparison of the Individual and Group Rorschach Tests.” 
Rorschach Research Exchange 6: 89-108; July 1942. 


. Jacopy, Jutta. “The Nursery School as an Experience in Therapy.” American 
61. 


Journal of Orthopsychiatry 13: 162-66; January 1943. 
Jasrak, JoserpH. “Rorschach Performance of Alcoholic Patients.” Delaware 
State Medical Journal 12: 120-23; May 1940. 


95 














Review oF EpucaTIONAL RESEARCH Vol. XIV, No. 1 





62. 


63. 
~64. 
65. Ke 


66. 


67. 
68. 
69. 
70. 
71. 


72. 
73. 
74. 
75. 


76. 
77. 
78. 
79. 
80. 
81. 


82. 
83. 


Jenxins, Ricnarp L., and Becku, Erica. “Finger Puppets and Mask Making 
as Media for Work with Children.” American Journal of Orthopsychiatry 12: 
294-301; April 1942. 

Jones, Harotp E. “The Adolescent Growth Study: Analysis of Voice Records.” 
Journal of Consulting Psychology 6: 255-56; September-October 1942. 

Kanner, Leo. “Play Investigation and Play Treatment of Children’s Behavior 
Disorders.” Journal of Pediatrics 17: 533-46; October 1940. 

LLEY, Douctas M., and Barrera, S. Eucene. “Rorschach Studies in Acute Ex- 
eae my Alcoholic Intoxication.” American Journal of Psychiatry 97: 1341-64: 

ay ; 

Kettey, Dovctas M., and oruers. “Intravenous Sodium Amytal Medication as 
2 a to the Rorschach Method.” Psychiatric Quarterly 15: 68-73; January 

Kxoprer, Bruno. “Instruction in the Rorschach Method.” Journal of Consulting 
Psychology 7: 112-19; March-April 1943. 

Kuoprer, Bruno, and Ketitey, Douctas M. The Rorschach Technique. Yonkers- 
on-Hudson, N. Y.: World Book Co., 1942. 436 p. 

K.oprer, Bruno, and Marcuuies, Heren. “Rorschach Reactions in Early Child- 
hood.” Rorschach Research Exchange 5: 1-23; January 1941. 

KnoeprMacuer, Jutiana. “The Use of Play Diagnosis and Therapy in Psychiatric 
Case Work.” Smith College Studies in Social Work 12: 217-62; March 1942. 

Krarrt, Marcaret R., and Voruaus, Pautine G. “The Application of the 
Rorschach Method in a Family Case Work Agency.” Rorschach Research Fx- 
change 7: 28-35; January 1943. 

KrucmMan, Jupirn I. “A Clinical Validation of the Rorschach with Problem 
Children.” Rorschach Research Exchange 6: 61-70; April 1942. 

Krucman, Morris. “Rorschach Examination in a Child Guidance Clinic.” Ameri- 
can Journal of Orthopsychiatry 11: 503-12; July 1941. 

Krucman, Morris. “The Rorschach in Child Guidance.” Journal of Consulting 
Psychology 7: 80-88; March-April 1943. 

Krucman, Morris. “Out of the Ink Well: The Rorschach Method.” Rorschach 
Research Exchange 4: 91-101; July 1940. (Also in Character and Personality 
9: 91-110; December 1940.) 

Lerner, Evcene, and Murpny, Lois B. Methods for the Study of Personality in 
Young Children. Society for Research in Child Development Monograph, Vol. 
6, No. 4, Serial No. 30. 1941. 

Levine, Kate N., and Grassi, JosepH R. “The Relation between Blot and Con- 
cept in Graphic Rorschach Responses.” Rorschach Research Exchange 6: 71-73: 
April 1942. 

Levy, Davi M. “Hostility Patterns.” American Journal of Orthopsychiatry 13: 
441-61; July 1943. 

Lewinson, Tuea S., and Zusin, Josepu. Handwriting Analysis. New York: King’s 
Crown Press, 1942. 147 p. 

Linpner, Rosert M., and CHapman, K. W. “An Eclectic Group Method.” 
Rorschach Research Exchange 6: 139-46; October 1942. 

Lone, Wiuiam F., and Tirrin, Joserx. “A Note on the Use of Graphology by 
Industry.” Journal of Applied Psychology 25: 469-71; August 1941. 

LowenrFeLp, Viktor. “Self Adjustment thru Creative Activity.” American Journal 
of Mental Deficiency 45: 366-73; January 1941. 

Lyie, Jeanerta, and Hotty, Sopnie B. “The Therapeutic Value of Puppets.” 
Bulletin of the Menninger Clinic 5: 223-26; November 1941. 


84. Macrartane, Jean W. “Problems of Validation Inherent in Projective Methods.” 


85. 


86. 


87. 


American Journal of Orthopsychiatry 12: 405-11; July 1942. 

Marcuures, Heten. “Rorschach Responses of Successful and Unsuccessfu! 
Students.” Archives of Psychology, Vol. 38, No. 271. New York: Columbia 
University Press, July 1942. 61 p. 

MAsserMaN, Jutes H., and Batxen, Eva R. “The Psychoanalytic and Psychiatric 
Significance of Phantasy.” Psychoanalytic Review 26: 343-79; July 1939. 535-49; 
October 1939. 

McCowan, P. K. “The Subconscious in Story Writing.” Journal of Mental Science 
89: 59-63; January 1943, 

















po chi STN 0: Free 


February 1944 METHODS IN THE STUDY OF PERSONALITY 





88. 


89. 
90. 


91. 
92. 
93. 
94. 


95. 
96. 


97. 


101. 


102. 
103. 
104. 
105. 
106. 
107. 
108. 
109, 
110. 


111. 
112. 
113. 


114. 


McInrosu, Janette R., and Pickrorp, R. W. “Some Clinical and Artistic Aspects 
of a Child’s Drawing.” British Journal of Medical Psychology 19: 342-61; 
June 1943. 

Mirae, Frorence R., and Harrower-Erickson, Mo.ute R. “Personality Structure 
in the Psychoneuroses.” Rorschach Research Exchange 4: 71-74; April 1940. 

MippLerown, Warren C. “The Ability of Untrained Subjects to Judge Neurot- 
icism, Self-Confidence and Sociability from Handwriting Samples.” Charactér 
and Personality 9: 227-34; March 1941. 

Moses, Paut J. “Social Adjustment and the Voice.” Quarterly Journal of Speech 
27: 532-37; December 1941. 

Moses, Paut J. “The Study of Personality from Records of the Voice.” Journal 
of Consulting Psychology 6: 257-61; September-October 1942. 

Monroe, Rutw L. “An Experiment in Large Scale Testing by « Modification 
of the Rorschach Method.” Journal of Psychology 13: 229-63; April 1942. 

Munroe, Rutu L. “Inspection Technique: A Modification of the Rorschach 
Method of Personality Diagnosis for Large Scale Application.” Rorschach 
Research Exchange 5: 166-90; October 1941. 

Monroe, Rutu L. “Use of the Rorschach Method in College Guidance.” Journal 
of Consulting Psychology 7: 89-96; March-April 1943. 

Murpuy, Lois B. “Patterns of Spontaneity and Constraint in the Use of Pro- 
jective Materials by Pre-School Children.” Transactions of the New York 
Academy of Science 4: 124-28; 1942. 

Murray, Henry A. Exploration in Personality. New York: Oxford University 
Press, 1938. 761 p. 


. Newer, H. Wairman. “Play Therapy in Child Psychiatry.” American Journal 


of Orthopsychiatry 11: 245-51; April 1941. 


. Pautsen, Atma. “Rorschachs of School Beginners.” Rorschach Research Ex- 
100. 


change 5: 24-29; January 1941. 

Piotrowski, Zycmunt A. “A Simple Experimental Device for the Prediction of 
Outcome of Insulin Treatment in Schizophrenia.” Psychiatric Quarterly 14: 
267-73; April 1940. 

Prorrowsk1, Zycmunt A. “Tentative Rorschach Formulae for Educational and 
Vocational Guidance in Adolescence.” Rorschach Research Exchange 7: 16-27; 
January 1943. 

Prorrowski, Zycmunt A. “Use of the Rorschach in Vocational Selection.” 
Journal of Consulting Psychology 7: 97-102; March-April 1943. 

ProsHansky, Harotp M. “A Projective Method for the Study of Attitudes.” 
Journal of Abnormal and Social Psychology 38: 393-95; July 1943. 

Rapaport, Davi. “The Clinical Application of the Thematic Apperception Test.” 
Bulletin of the Menninger Clinic 7: 106-13; May 1943. 

Rapaport, Davw. “Principles Underlying Projective Techniques.” Character and 
Personality 10: 213-19; March 1942. 

Rocuuin, Grecory N., and Levine, Kate N. “The Graphic Rorschach Test I.” 
Archives of Neurology and Psychiatry 47: 438-48; March 1942. 

Ropnickx, Euiot H. “Projective Reactions to Induced Frustration as a Measure 
of Social Adjustment.” Psychological Bulletin 39: 489; July 1942. 

Ronem, Geza. “Children’s Games and Rhymes in Duau (Normanby Island). x 
American Anthropologist 45: 99-119; January-March 1943. 

Rouerm, Geza. “Play oe with Normanby Island Children.” American Journal 
of Orthopsychiatry ll: 524-30; July 1941. 

Rorscuacu, HERMAN. Psychodiagnostics: A Diagnostic Test Based on Perception. 
(Translated Paul Lemken and Bernard Kronberg.) Berne, Switzerland: 
Verlag Hans Huber; New York: Grune and Stratton, 1942. 226 p. 

Rosenzweic, SAUL, “Fantasy and Personality and Its Study by Test Procedure.” 
Journal of Abnormal and Social Psychology 37: 40-51; Sines 1942. 

Ross, W. Donato. “The Contribution of the Rorschach Method to Clinical 
Diagnosis.” Journal of Mental Science 87: 331-48; July 1941. 

Rorrer, Juuian B. “Studies in the Use and Validity of the Thematic Appercep- 
tion Test with Mentally Disordered Patients. I-Method of Analysis and 
Clinical Problems.” Character and Personality 9: 18-34; September 1940. 

Sanrorp, Fittmore H. “Speech and Personality: A Comparative Case Study.” 
Character and Personality 10: 169-98; March 1942. 


97 








REVIEW OF EDUCATIONAL RESEARCH 





Vol. XIV, No. 1 








115 
116 


117. 


118. 
119. 
120. 
121. 


122. 


123. 
124. 
125. 


126. 
127. 
128. 

129. 

130. 

131. 


332. 
133. 
134. 
135. 


98 





Sanprorp, R. Nevitt, and otHers. Physique, Personality, and Scholarship. Society 
for Research in Child Development Monograph, Vol. 8, No. 1, Serial No. 34, 1934. 

Sarason, Seymour B. “The Use of the Thematic Apperception Test with Mentally 
Retarded Children.” American Journal of Mental Deficiency 47: 414-21; Apri! 
1943. 

Scwacnte.t, AnnA H.; Henry, Juces; and Henry, Zunta. “Rorschach Analysis 
of Pilaga Indian Children.” American Journal of Orthopsychiatry 12: 679-713: 
October 1942. 

Scuivper, Paut, and Levine, Estuer L. “Abstract Art as an Expression of Human 
Problems.” Journal of Nervous and Mental Diseases 95: 1-10; January 1942. 
ScHMIDL-WAEHNER, TrupDE. “Formal Criteria for the Analysis of Children’s Draw- 

ings.” American Journal of Orthopsychiatry 12: 95-104; January 1942. 

Sotomon, Josern C. “Active Play Therapy; Further Experiences.” American 
Journal of Orthopsychiatry 10: 763-82; October 1940. 

StavriANnos, B xrtHa. “An Investigation of Sex Differences in Children as Re- 
vealed by the Rorschach Method.” Rorschach Research Exchange 6: 168-75: 
October 1942. 

Super, Donatp E. “A Comparison of the Diagnosis of a Graphologist with the 
Results of Psychological Tests.” Journal of Consulting Psychology 5: 127-33; 
May-June 1941. 

Symonps, Percivat M. “Play Technique as a Test of Readiness.” Understanding 
the Child 9: 8-14; June 1940. 

TALLMAN, FRANK F., and Gotpensoun, Leon N. “Play Technique.” American 
Journal of Orthopsychiatry 11: 551-61; July 1941. 

Tomkins, Sitvan S. “The Limits of Material Obtainable in a Single Case Study 
by Daily Administration of the Thematic Apperception Test.” Psychological 
Bulletin 39: 490; July 1942. 

TuppenHAM, Reap. “The Reputation Test as a Projective Technique.” Psycho- 
logical Bulletin 38: 749; October 1941. 

Watson, Mauve E. “Play Technique.” Journal of Pediatrics 17: 674-79; No- 
vember 1940. 

Welss-Franxi, Anni B. “Play Interviews with Nursery School Children.” Ameri- 
can Journal of Orthopsychiatry 11: 33-39; January 1941. 

Waites, W. H. “Treatment of Emotional Problems in Childhood.” Journal o/ 
Mental Science 87: 359-69; July 1941. 

Wotrr, Werner. The Expression of Personality. New York: Harper and Brothers, 
1943. 334 p. 

Wotrr, Werner. “Projective Methods for Personality Analysis of Expressive 
Behavior in Pre-School Children.” Character and Personality 10: 289-95; June 
1942. 

Wricnt, Beatrice A. “An Experimentally Created Conflict Expressed in a Pro- 
jective Technique.” Psychological Bulletin 38: 718; October 1941. 

Wyatt, Frepericx. “Formal Aspects of the Thematic Apperception Test.” Psycho- 
logical Bulletin 39: 491; July 1942. 

Zusin, Josepn. “A Psychometric Approach to the Evaluation of the Rorschach 
Test.” Psychiatry 4: 547-66; November 1941. 

Zusin, Josern. “A Quantitative Approach to Measuring Regularity of Succes 
sion in the Rorschach Experiment.” Character and Personality 10: 67-78; 
September 1941. 





wake ache AaL I 








CHAPTER VIII 


Measurement of Psychoeducational Growth 


WARREN G. FINDLEY 


Tus cuaprer presents matter relevant to the use of tests, psychological 
or achievement, to accomplish those purposes which our present knowl- 
edge of human development renders desirable and feasible. Several prin- 
ciples of growth and development are presented and their implications 
for testing indicated. 


Unity of Personal Development 


Smith and others (5) stated that “One of the most influential psy- 
chological principles guiding the work (of evaluation in the Eight-Year 
Study) has been the assumption that the essential characteristic of hu- 
man behavior is its organic unity, and that various aspects of it func- 
tion in close relationship with each other” and “usually no single 
type of growth could be fully achieved without some progress in all 
others.” The definition of intelligence proposed by Stoddard (6) “the 
ability to undertake activities that are characterized by (a) difficulty, 
(b) complexity, (c) abstractness, (d) economy, (e) adaptiveness to 
a goal, (f) social value, and (g) the emergence of originals, and 
to maintain such activities under conditions that demand a concentra- 
tion of energy and a resistance to emotional forces” gives a picture 
of an effectively functioning individual, strong in the many aspects 
that are tested by the whole set of psychological technics discussed in 
previous chapters of this issue. Such views of unity of being and develop- 
ment suggest the importance of supplementing the technics already 
discussed by observational data on personal behavior and by tests of the 
sort described in Smith and others (5) wherein the student is asked 
to select democratic views on controversial social issues in a consistent 
fashion and choose logical and factually sound arguments supporting 
his views. Moreover, appraisal of student progress must be compre- 
hensive, including measurement of physical and physiological correlates 
of mental and social development as described in Remmers and Gage (4). 


Continuity of Personal Development 


“The most characteristic thing about a child is that he grows; he keeps 
on changing into something else” (6). As a unified personality, he is 
developing toward maturity. Adequate knowledge of this development 
can be acquired only thru continuous evaluation and “the continuity 
of the evaluation process implies that it should go on during all the 
time the teacher can observe the pupil, not only on the special occasions 
when tests are given or report-card grades are determined” (4). This in 
turn implies the maintenance of individual cumulative records, perhaps 
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less elaborate but of the type discussed by Smith and others (5). It also 
implies systematic testing of individual pupil progress from year to year 
in what Lindquist (3) calls “general educational development” and 
achievement of the “ultimate objective of instruction” in describing 
the plan for the “Fall Testing Program for Iowa High Schools” and 
the Iowa Tests of Educational Development (2). “Progress testing” and 
“progress tests” are standard terms in the New York State program (7) 
now developing along similar lines. “Achievement on this year’s. test 
takes on added significance when judged in the light of previous achieve. 
ment and relevant notations about health, social and emotional de- 
velopment, etc. that may be entered on such records” (7). A number 
of standardized test batteries lend themselves to this use. 


The Dynamic, Interactive Nature of Personality Development 


“Personality is viewed as a product of the interaction of forces within 
the individual and his surroundings” (5). Consequently, “the validity of 
any evaluation device must eventually be traced to and measured by the 
degree to which it satisfies the interacting needs of individuals and the 
social order. . . . It is in terms of this criterion that not only the in- 
structional process should be validated but also every phase of the 
evaluation process, from the total testing program through specific tests 
down to individual test item” (4). 

A corollary of these statements is that rapport between child and 
examiner should be such that the child has confidence in and is willing 
to cooperate with the examiner. This is important not simply to insure a 
valid estimate of the child’s ability on the particular test being ad- 
ministered, but also to further a reaction to tests and testing programs 
that means the values » be gained from the immediate evaluation will 
be “acceptable” to the child and will leave him disposed to look forward 
to benefits from any further testing to be attempted. 

Several of the references already cited stress the spacing and scheduling 
of testing as a significant factor in accomplishing rapport. Smith and 
others (5) say of the evaluation in the Eight-Year Study: “the total time 
devoted to testing could not be so great that students and faculty thought 
themselves burdened by tests.” They go on to say: “the schedule had to 
be drawn so that there was no undue concentration of formal tests toward 
the end of the year.” Three of the references (1, 2, 7) stressed the 
great virtue of annual fall testing in this connection. The following state- 
ment is representative: “Testing pupils at the beginning of the school 
year provides data helpful to the teacher as she begins work with her new 
class. . . . Testing at the beginning of the year is more appropriate be- 
cause, after all, the true measure of effective teaching is not what pupils 
know in June, but what survives the vacation to become the basis for 
their further progress. . . . Fall testing avoids the undesirable practice 
of cramming. . . . Testing at this time places a wholesome emphasis 
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on collaboration between teacher and pupil to achieve goals of instruction 
in the year ahead” (7). 

Organization of tests of mental development into sections devoted 
to each of several identifiable abilities characterizes several of the testing 
programs described (1, 2, 3; 5, 7). Not only are such tests more useful 
to the teacher, but also they meet the child’s proper interest in learning 
more specifically the nature of his strengths and weaknesses. The mo- 
tivational effect of such specific knowledge is a research finding of long 
standing. 

Procedures for interpreting a child’s achievement to him are significant 
factors in rapport. The major testing programs cited (1, 2, 7) are 
representative of all achievement batteries in their emphasis on use of 
profiles for drawing the child’s attention to his own comparative strengths 
and weaknesses, and away from comparison with the achievement of his 
. associates. Smith and others indicated as a fundamental principle of the 
interpretative procedures in the Eight-Year Study that they “favored 
instruments and devices which yielded descriptive diagnoses of students 
and which, because of this characteristic, could not be easily converted 
into grades and marks” (5). 
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CHAPTER IX 


Construction and Application of Psychological Tests 
in the Armed Services 


JOHN M. STALNAKER 


A crear peat of the scientific work now being engaged in for the war 
cannot be reported in full or even in part at the present time. This limita- 
tion applies to the work of psychologists in the development and use of 
tests as well as to the work of the physicists, chemists, and mathematicians. 
Undoubtedly the most authoritative and comprehensive statements which 
can be made at this time must come from the men in the services. Therefore, 
representatives of the Armed Forces and the Armed Services Institutes 
who are concerned with psychological testing were asked to prepare 
a statement of their work. 

The use of psychological tests is today so common and widespread 
in all branches of the services that complete coverage of the work is 
impossible. The reviews presented here are believed to be representative 
of the work in other sections not here reported. The activities of the various 
groups working for the National Defense Research Committee and those 
of the projects sponsored by the Committee on Service Personnel, Selec- 
tion and Training, of the National Research Council are not open for 
report at this time. The activities of various other civilian groups are 
not covered here, even tho their efforts are directed exclusively toward 
the war effort. Examples of such activities are the Aeronautics Aptitude 
Test being used with the Victory Corps and the Army-Navy College 
Qualifying Test. 


United States Armed Forces Institute Examinations 
Examination Staff, Armed Forces Institute 


Members of the armed forces have many opportunities for educational 
growth. Specialized military training, off-duty education, and the less 
formal activities of military life provide much educational experience. 
In order to assist members of the services to capitalize these opportunities 
suitably and to aid schools and colleges in evaluating their attainment 
the Armed Forces Institute has developed a substantial evaluation program. 

The Examination Staff has been directed to construct three major 
types of examinations, of which the first is to serve chiefly as a source 
of information for the student, and the second and third primarily as a 
basis for placement and credit. 

The first type consists of tests to be given students at the completion 
of courses taken in the Institute. These end-of-course tests are used 
primarily to indicate to the student how well he has mastered the work 
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of the course. They are not designed to be used as a basis for place- 
ment or credit in school or college. 

The second type includes field or subject examinations. Tests of this 
type are built to measure competence to deal with the material commonly 
provided in high-school or college courses or fields. The report on these 
field examinations should be of particular value to the school or college 
in placing a member of the armed forces when he returns to school or 
college, and in granting him fair credit for his educational attainments. 

The third type of tests consists of the tests of General Educational De- 
velopment. Two batteries of such tests have been developed, one for 
the high-school level and the other for the college level. The tests in- 
clude the kinds of exercises in the major subject fields. As a general 
placement battery, these tests should prove particularly useful for members 
of the armed forces who have been out of school for some time but 
who have had a good many educational experiences since leaving school. 
These tests of educational development are now being standardized on a 
carefully selected sample of educational institutions. 

It should be noted that neither in the case of the field examinations 
nor in that of the tests of general educational development will the Armed 
Forces Institute undertake to dictate to an educational institution the 
amount of credit to be granted. The soldier or sailor applies to the com- 
mandant of the Institute for the examinations which he believes he is 
ready to take. The appropriate examinations are selected and sent to his 
commanding officer for administration. When the tests are returned to the 
Institute for scoring, the officer certifies that the instructions for admin- 
istration of the tests have been followed and that no one but the applicant 
has seen the examinations. The examination results are recorded by one 
of the Institute registrars. Upon request of the man himself, the tran- 
script of his examination record, together with descriptive data on his 
training and assignment in the service, is sent to the high school or college. 

It is possible for the educational institution to interpret these results in 
several ways. The examinations have been constructed to yield part-scores, 
so that it is possible for the Institute to provide a descriptive interpreta- 
tion of the applicant’s attainment. Results can also be reported in com- 
parative terms—that is, in terms of the percentile rank for students in 
school or college. Finally, it is possible for the school or college to 
obtain from the American Council on Education copies of alternate 
forms of the tests, and by giving the tests locally to see whether the score 
made by the soldier is comparable to scores made by its own students. 


The Army Air Forces Program in Aviation Psychology 
Lieutenant Colonel John C. Flanagan 


The present program of research in aviation psychology was initiated 
in July 1941. The initial assignment of the psychological staff was the 
development of a battery of tests for the selection and classification of 
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men for training in the Army Air Forces as pilots, bombardiers, or 
navigators. Since January 1942 all applicants for such training have been 
examined first with an initial screening test called the Aviation Cadet 
Qualifying Examination and, if accepted, given a more comprehensive 
battery of psychological tests in an Army Air Forces Classification Center. 

The objective of this program has been to develop procedures which 
insure that those applicants best fitted for aircrew training are accepted 
for such training, and that these men be so assigned to the various types 
of training—pilot, bombardier, and navigator—that the resulting air- 
crews will be maximally efficient. Three articles have been prepared 
by professional writers concerning this program (5, 12, 17). General 
discussions of the work have been published by Flanagan (4) and by 
Guilford (6). To keep aviation psychologists in the Army Air Forces 
informed about developments by other groups working on these prob- 
lems, Headquarters, Army Air Forces, has published the Aviation 
Psychology Abstract Series. These abstracts are distributed only to in- 
dividuals working on problems involving the selection and training of 
military personnel. 


Test Development 


The test development program was originally directed toward pro- 
ducing practical and valid measures for the twenty psychological traits 
which an analysis of the records of 1000 men eliminated from pilot 
training indicated were important for success in that type of training. This 
list of traits was revised on the basis of later studies as reported in num- 
bers of the Analysis of Duties Bulletins. Also, a few traits were added, 
such as finger dexterity and mathematical proficiency, which are di- 
rected specifically toward the prediction of success in bombardier and 
navigator training. 

In trying to develop an efficient battery of tests for this purpose, more 
than two hundred tests have been developed. A battery of about twenty 
of these tests has been found to include practically all the psychological 
elements contributing to success which have thus far been identified. The 
current problem is to develop tests which measure functions which are 
sufficiently unique as related to the present battery so that their addition 


produces a significant improvement in the predictive efficiency of the 
battery as a whole. 


Statistical Records and Analysis 


The administering of twenty tests to each of several hundred men every 
day has necessitated the development of new procedures for administering 
and scoring tests and processing the results. The answers for the pencil 
and paper tests are recorded on separate answer sheets and scored by 
electrical test-scoring machines. A considerable amount of research has 
been done on the effect of time of day, sequence, size of group, examiner, 
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previous observation, practice, seating location, and similar factors on 
test scores. 

Aptitude scores based on the best weighted combination of the test 
scores are obtained for each of the three types of training (pilot, bom- 
bardier, and navigator) for each individual. This is accomplished by 
recording all the scores for an individual on a special weighting sheet. 
These sheets may be run thru the scoring machine with a particular set 
of weights set in the machine, and the combined score, properly weighted, 
read off the dial. A second scoring of the sheets with a new set of weights 
gives the aptitude score for another type of training. This large-scaled 
application of multiple correlation procedures has led to numerous dis- 
coveries concerning these procedures. In summarizing the development 
and use of tests in the selection and classification of aircrew personnel 
for the Army Air Forces, it seems appropriate to compare the recent prog- 
ress made by psychologists in predicting success in flying training schools 
with the progress made by psychologists in the past two decades in pre- 
dicting success in the academic schools of the country. In the past two 
years, by means of the coordinated research program in the Army Air 
Forces, more progress has been made in the development of tests and 
procedures for predicting ability to succeed in flying training than was 
made in developing tests and procedures for predicting ability to succeed 
in academic courses in the preceding twenty years. 


Use of Psychological Tests in Naval Aviation 
Lieutenant Commander John G. Jenkins, USNR 


In the late 1930’s certain naval flight surgeons in responsible positions 
became dissatisfied with the traditional attempt to determine psychological 
adaptability to flight duty by means of an interview. In 1939 a grant of 
funds from the Civil Aeronautics Authority led to the establishment of 
the Committee on Selection and Training of Aircraft Pilots under the 
National Research Council. When this Committee was established, the 
military services were represented by membership in its advisory group. 
In 1940 the Navy representatives invited the Committee to establish a 
project, at a naval aviation training center, to investigate the possible 
applicability of psychological tests to the selection of naval aviators. 

Following the establishment of this project, a number of psychologists 
were commissioned in the Naval Reserve and called to active duty. In 
close collaboration with the CAA-NRC Committee, these psychologists 
were authorized to administer certain tests to all applicants for naval 
aviation training. By the fall of 1941, enough of these cadets had com- 
pleted training to permit preliminary validation of the tests. Within 
three weeks after Pearl Harbor, the Navy required that all future candi- 
dates for aviation training should meet specified cutting scores on those 
tests which had proved to be the most useful predictors. It may be reported 
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that the tests have held up well under continued use. Uncontaminated 
criterion data are now available on several thousand cadets who have 
completed training; and the indexes of predictive efficiency have changed 
very little from those originally obtained on a much smaller group. 

Test scores have been, and are now, used in naval aviation in two 
ways. First, they are used as a basic screening device when candidates 
announce their desire to enter naval aviation. The second use of the 
test scores is somewhat more novel. This is their employment as evidence 
to be considered if the cadet ever presents himself as a borderline case. 
Thus, if a cadet is a clear-cut failure or a clear-cut passer in aviation 
training, his test scores are never again brought into consideration, once 
he has cleared the original screening process. Because of the complexity 
of factors operating during training, however, there is ordinarily a con- 
siderable group of “borderline” cases. 

Experience has shown that there are, in naval aviation, other uses 
for tests which cannot be discussed at this time. It may readily be said, 
however, that the biggest task in each case has been to find a satisfactory 
and dependable criterion. Once this has been found, the possibility of 
prediction has thus far always turned into a reality. 

In summary, it has been found possible to validate a battery of tests 
for selecting naval aviators which has gained wholehearted acceptance 
on the part of the line and staff officers concerned. This acceptance has 
served to pave the way for invitations to work with those problems in 
learning—on the ground and in the air—to which the professional train- 
ing of the psychologist is particularly well suited. The opportunity for 
such contributions is much enhanced by the willingness of the psy- 
chologist to put himself in the position of undergoing training comparable 
to that of the average cadet. 


Psychological Tests Used in the U. S. Navy 
Commander Alvin C. Eurich, USNR 


Psychological tests are used by the United States Navy for the classifica- 
tion of officers and of enlisted men. They may also be grouped as 
general tests and special tests. 


Tests for Enlisted Personnel 


The Navy Basic Test Battery consists of seven tests. The Battery is ad- 
ministered to all enlisted personnel during the first three weeks of 
their recruit training. The results are used along with other data in assign- 
ing men to the type of service school most likely to utilize their peculiar 


aptitudes and abilities or to general sea duty. The following tests are in- 
cluded in the Battery: 


1. The General Classification Test (GCT) is designed to measure the verbal ability 
long known to correlate with scholastic success. It is composed of three subtests: 
Sentence Comprehension, Opposites, and Analogies. 
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2. The Reading Test is designed to measure the ability to read and to comprehend 
the large amount of printed material which all personnel must use. The test content 
was selected from navy publications read by all enlisted men. 

3. The Mechanical Aptitude Test is designed to measure potential success in work 
of a mechanical nature. It consists of three subtests: Block Counting, Mechanical 
Comprehension, and Surface Development. 

4. The Mechanical Knowledge Test is designed to measure achievement along 
mechanical lines. It is composed of two subtests: Tool Relationships and Mechanical 
Information, each of which is again subdivided into electrical and mechanical sections. 

5. The Mathematical Reasoning Test provides an indication of the individual’s 
ability to calculate and to think in quantitative terms. This ability bears a demonstrable 
correlation with success in an increasingly mechanized and quantified Navy. 

6. The Radio Code Aptitude Test samples speed and accuracy of code recognition 
in terms of recognition of three standard symbols following brief exposure. 

7. The Clerical Aptitude Test is intended as an index of spelling ability and the 
perceptual speed and accuracy necessary for business detail work of various sorts. 


Enlisted personnel of the WAVES are given the Enlisted Qualifica- 
tion Test (WR) when they make their application. This test is designed 
as a negative screen for three broad types of mental abilities: linguistic, 
mechanical, and arithmetical. During their recruit training the Navy 
Basic Test Battery is administered with the exception of the Mechanical 
Knowledge Test. In its place all enlisted WAVES are given an English 
test intended to determine the degree of their mastery of punctuation, 
capitalization, usage, diction, and of the distinction between clear, complete 
sentences and incomplete or faulty ones. 


Tests used in assigning enlisted men to special schools or duties consist 
of the following: 


1. At the underwater sound school, a special battery consists of a mechanical com- 
prehension test and measures of tonal and pitch discrimination. 
2. Assignees to radio material are selected by a Pre-Radar Material Selection Test, 


composed of questions on arithmetic, mathematics, physics, shop practice, electricity, 
and radio. 


3. The Short Test of Opinions given to WAVES candidates for link-trainer in- 


structor, is designed to eliminate those persons whose work attitudes are potentially 
unsatisfactory. 


4. Four subtests of the Thurstone Primary Mental Abilities are used to identify 
WAVES candidates for control tower operator (flags, figures, pedigrees, and letter 
grouping). 


5. Achievement tests in typing and shorthand are used to aid in selection of WAVES 


yeoman (clerical workers) who will be rated as petty officers upon completing recruit 
training. 


Tests for Officer Personnel 
The Officer Qualification Test is used in offices of Naval Officer Pro- 


curement to aid in the selection of male officers, officers (WR), and 
SPAR officers from among those persons who apply for a commission. 
The test is designed as a negative screen in the areas of linguistic, me- 
chanical, and arithmetical ability. 

A special Officer Battery is employed in selecting midshipman candi- 
dates for Radar Officers’ School. The tests involved are a Pre-Radar Ap- 
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titude Test (with subtests on mechanical aptitude, number series, and 
analogies); a spiral omnibus Mathematics Test and General Physics 
Test which emphasizes modern physics, light, and sound. A similar 
test is the brief Engineer Classification examination, an achievement 
test designed to select experienced men for Radio Materiel School. It is 
composed of sixty items in the fields of mathematics, physics, electrical 
engineering, mechanical practice, and radio. 

Finally, the navy tests civilian applicants for the (V-12) College Pro- 
gram. The test employed has four sections: a verbal section, a section 
on science information, a reading section, and a mathematics section. 
This same test is also used by the Army as an aid in selecting men for 
the Army Specialized Training Program. It is administered twice a year at 
all high schools and colleges of the country. 

Aside from these tests, the Navy is also constructing a wide variety 
of achievement examinations which are used in the training schools and 
along with self-instructional training courses. The results of these tests 
are likewise used as aids in the classification of personnel. 

New forms and revisions of these tests are constantly being constructed 
in the Test and Research Unit of the Standards and Curriculum Section, 
Training Division of the Bureau of Naval Personnel. The Navy receives 
assistance and cooperation in this work from the National Defense Re- 
search Committee and the College Entrance Examination Board. 
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CHAPTER X 


Statistical Methods Related to Test Construction 
and Evaluation 


HERBERT S. CONRAD 


This REVIEW continues the survey by Flanagan in the February 1941 
issue of the Review. Specialized surveys, by Fattu and by Lorge, ap- 
peared in the December 1942 issue. Certain important studies have 
been deliberately excluded from the present review, because they are 
already included in the bibliographies of Fattu and Lorge. For a compre- 
hensive view of the current picture, the three reviews (Fattu, Lorge, and 
the present one) should preferably be read together. 


Bibliographies 

Bibliographic references to studies on statistical method may be found 
in Psychological Abstracts; Education Abstracts; Education Index; the 
annual surveys by Swineford and Holzinger (129); the annual bibliog- 
raphies by Good (51); the Yearbooks edited by Buros (11); the section 
on “Research Abstracts and Bibliographies” in the Journal of Educational 
Research; a similar section in Educational and Psychological Measure- 
ment; and occasional surveys of progress in the Journal of the American 
Statistical Association, the Annals of Mathematical Statistics, the Journal 


of the Royal Statistical Society, and special issues of the REview or Epv- 
CATIONAL RESEARCH. 


Texts 


New texts include those by Chambers (20), Guilford (53), Lindquist 
(revised edition) (79), Mode (89), and Walker (148). Workbooks or 
study manuals have been prepared by Lindquist (revised edition) (81), 
Nelson and Denny (93), and Stone and Georges (128). Notable texts 
on mathematical statistics include those by Wilks (154) and by Ait- 
ken (1). 


Grouping of Measures 


In two able empirical studies, Tildesley (136, 137) evaluated the ef- 
fect of coarseness of grouping on the accuracy of the mean and the stand- 
ard deviation. Davies and Bruner (32) developed a new formula, more 
generally applicable than Sheppard’s, for correcting the standard devia- 
tion for grouping. Pierce (101) presented correction formulas designed 
to yield the exact (ungrouped) values of both the mean, the standard 
deviation, and higher moments. Bloom (7), in a study which disregarded 
aspects other than errors of measurement, recommended much coarser 
grouping than is generally employed. 
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Skewness and Kurtosis 


Raiford (105) presented a formula for the measurement of skewness 
in combined sets of data. Scates (117), in a thorogoing paper, exposed 
the characteristics and deficiencies of Pearson’s moment coefficient of 
kurtosis. 


Correlation 


Texts—The volume by Ezekiel (42), now available in a second edition, 
remains preeminent in this field. Chapter 19, on the reliability of an in- 
dividual forecast, deserves special mention. A more mathematical treat- 
ment of correlation is given by Treloar (140). 

Calculational methods and aids—Platt (102) developed a device for 
the mechanical determination of correlation coefficients and standard 
deviations; this may also be employed to obtain the sum of squared 
deviations, and the total product-moment. Schumann (120) described 
a mechanical method for the rapid calculation of regression coefficients 
and the solution of simultaneous linear equations. Bloom and Lubin 
(8) presented a technic for obtaining correlations and intercorrelations 
with the aid of the graphic item-counter of the International Business 
Machines Corporation test-scoring machine. Dwyer (36) compared the 
efficiency of the calculating machine with that of the punched-card equip- 
ment for calculating correlations. Assuming the use of ordinary hand 
and calculating-machine methods, Stead and Shartle (126) prepared a 
detailed set of directions for various statistical procedures, including the 
Wherry-Doolittle technic (151). Jackson (67) supplied procedures and 
formulas to obtain approximate multiple regression weights. A general 
review of recent advances in the calculation of correlation (particularly 
partial and multiple correlation) was presented by Dwyer (37). Guttman 
and Cohen (58) showed how regression coefficients and partial and 
multiple correlations may be calculated directly from an orthogonal 
factorial matrix. 

Biserial r—Peters (98) developed a technic for calculating biserial r 
from widespread classes. Royer (113) and DuBois (34) outlined pro- 
cedures for computing the biserial correlation with the aid of Hollerith 
machines. Richardson (110) pointed out the conditions under which bi- 
serial r may exceed 1.00. 

Corrections to r—Casanova (18) formulated the correction required 
on account of restricted variability in one variable. A more complete 
solution of the problem of correction of r for restricted or expanded 
variability was given by Emmett (39); Emmett’s formulas have been 
employed, in a less mathematical setting, by Burt (13). 

Regression—Baker (3) outlined a procedure for the fitting of linear 
regression lines when the standard deviations of arrays are not all equal. 
Use of a logarithmic transformation of data as one means of rectifying 
curvilinear regression was suggested by Schrek (119); Humm (65, 66) 
proposed another technic, which has been criticized by Johnson (70, 71). 
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Miscellaneous—Scates (118) presented formulas by which the correla- 
tion coefficient and other statistical constants in a combined group may 
be calculated from values known for the component groups. Conrad and 
Sanford (26) provided a formula by which the average intercorrelation 
among test items may be approximated from knowledge of the standard 
deviations of the items and the reliability of the total test. Sarbin (115) 
showed that for the prediction of academic status, the subjective, clinical 
approach added nothing to the efficiency of prediction achieved by the 
objective procedure of correlations. Mowbray (92), protesting against a 
purely empirical or ad hoc approach, declared that “the first requirement 
of any correlation analysis is a rational explanation of its regression 


equation” (92: 248). 
Reliability Coefficient and Accuracy of Measurement 


Reliability coefficient—Mosier (90) derived formulas to determine both 
the reliability of a composite and the best weights (from the viewpoint of 
maximal reliability) for the components of a composite. Ghiselli (49) 
described a procedure for determining the minimal reliability of a test 
when it is impossible to divide the test into two equivalent parts. The Spear- 
man-Brown formula was shown by Bruce (10) to overpredict the re- 
liability coefficient of average grades based on more than one semester's 
work in college. A formula for the reliability of profile records or 
psychographs was provided by Edgerton, Bordin, and Molish (38). 

Accuracy of measurement—As a measure of the reliability of a test. 
the ratio \/r:,/\/1—r,, was recommended by Butler (15). Bloom (7) 
pointed out that extreme scores have a smaller error of measurement, 
in terms of percentile units, than scores near the median; he also presented 
a formula to determine the decrease in error resulting from the combining 
of scores. Horn (62) made a systematic analysis of the influence of chance 
error and “specific factors” on individual scores and on various statistical 
measures. 


Factor Analysis 


New technics (not covered in previous reviews) include the application 
of the maximum-likelihood method to determine the proper number of 
factors (77); factorization by means of Fisher’s discriminant function 
(94); an inspectional, cluster-building method for the identification of 
factors (29); and a technic for factor analysis of the abilities of the single 
individual (104). Additional progress includes improved facility in com- 
putation (64, 144, 145), and increased agreement in results by different 
factor-analysis technics (61). Disagreement in findings, however, still 
may be found; thus McNemar (85), in an analysis of the revised Stan- 
ford-Binet, found one main general factor, with other factors negligible; 
while Burt and John (14) found numerous factors for the old Stanford- 
Binet. Disagreement still exists, also, concerning the importance or ad- 
vantages of orthogonality (independence) among factors. The early hope 
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that factor analysis would reduce the number of essential traits to only 
a very few has failed to materialize. For example, see the results reported 
by Burt and John (14) and by Cardall (17). Even less has it proved 
possible to measure factors directly by specially constructed, new tests; 
this led Burt (13) to a characterization of factors as “weighted patterns 
of test scores”—a view which, at least on the surface, fails to accord with 
“primary traits” or “essential principles of classification.” 

An important empirical study by Swineford and Holzinger (130) ob- 
tained retest correlations of .65-.80 for factor-scores of 385 high-school 
pupils measured in their freshman and sophomore years. 

Criticisms of factor analysis include Stalnaker’s demand for improve- 
ment of the tests employed in factor analysis (125); Thomson’s rec- 
ommendation that time scores, in order to be useful in factor analysis, 
should always represent the time taken for correct performance, rather 
than time for correct performance in one case and incorrect performance 
in another (133); and (among others) Reyburn and Taylor’s observation 
that random errors are likely to affect factors beyond the first excessively 
(108). Unless very highly reliable tests are used in large samples, it 
seems unlikely that the small, later factors can be satisfactorily reliable 
(52); the frequent use of tetrachoric r, instead of the more reliable 
product-moment correlation, does not help this situation. 


Alternatives to the Pearson r 


In an application of the Bernstein correlation coefficient, Read and 
Conrad (106) judged that, at least for their particular study, the Bern- 
stein r yielded results “less complimentary but more valid” than the 
Pearson r. Horn (63) derived a formula to correct the rank-difference 
correlation for the effect of tied ranks. Kendall (73) developed a method 
of determining partial correlation from ranks. Johnson (69) urged the 
use of contingency- rather than correlation-technic, when the trait to be 
predicted is the attainment of some critical score. 


Discriminant Function 


Garrett (46) has contributed a helpful exposition and review of 
Fisher’s discriminant-function technic for the maximal differentiation of 
two groups. The application of the discriminant function to differentiation 


of more than two groups is outlined and illustrated by Day and Sando- 
mire (33). 


Scores 


Because of the insensitivity of the IQ to changes in growth rate, 
Herring (60) recommended the direct measurement of rate of growth by a 
“mental quotient” (mQ=e where the subscripts 1 and 2 rep- 

Ne 1 
resent the first and subsequent measurement, respectively). Courtis (27) 
has urged the expression of present status in terms of estimated status at 
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maturity, and the matching of cases in educational experiments not 
only on present score but also on growth rate. The advantages of the 
“median score” method for computing the final score on a mental-test 
scale were set forth by Kuhlmann (76). Wiley and Wiley (152) presented 
two “progress indices”; one of these, the “progress differential,” shows 
the comparative rate of progress of the upper versus the lower portions 
of a class. 


Scoring Formulas 


In studies of the true-false test, Cronbach (30) pointed out that “ac. 
quiesence” (the tendency to mark an item “true” rather than “false” when 
guessing) impairs the validity of the conventional R minus W formula for 
scoring this type of test. Searle (121) developed scoring formulas ap- 
propriate for a multiple true-false (modified multiple-choice) type of 
item. Several methods for scoring the “rearrangement” or sequential- 
response test were contributed by Rosander (112). Cox and Harsh (28) 
outlined an objective procedure for eliminating unjustified discrepancy 
in the grades given by different instructors in a college course. 


Practical Scoring Procedures 


An ingenious device, by which incorrect answers or omissions ; ind 
out vividly when viewed by ultra-violet light, has been developed by 
Wallen and Rieveschl (149) to facilitate test scoring. For the scoring of 
performance-test products, Toops (138) proposed the use of code-numbers 
printed on appropriate spots and self-reported by the subject. 


Item Analysis and Weighting 


A convenient summary of methods of item analysis has been presented 
by Guilford (53, Chapter 14). Stead and Shartle (126, Appendix VII) 
described a simple method of item analysis in use by the U. S. Employment 
Service. For the facilitation of the burdensome clerical task of an item 
analysis, Lawshe (78) prepared a nomograph, and Fulcher and Zubin 
(44) devised a mechanical method for treating the fourfold table. Krol! 
(75), imvestigating the question whether item analysis results in an 
actually improved test, came to a definitely favorable conclusion; Travers 
(139), however, indicated circumstances under which the gain from item 
analysis is not large enough to justify the labor and expense involved. 
The use of graded weights for responses to items is a more quantitative 
or refined procedure than simply accepting or rejecting items for use in 
a test. In a rather elaborate study, Guilford, Lovell, and Williams (55) 
showed that “complete weighting” of multiple-choice responses—i.c., 
assignment of a differential weight to each of the four alternatives of- 
fered by the item—failed to yield more than a negligible improvement 
in reliability or validity. Essentially the same results were reported by 
Phillips (100). Casanova (19) prepared a table and facilitating graph 
for the weighting of tests in terms of their reliability, estimated directly 
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from the length of each test. No evidence of the superiority of this pro- 
cedure over the simple summation of raw scores was supplied. 


Machine Methods 


Anyone interested in machine methods of tabulating, scoring, or 
computing should consult the recent comprehensive review by Lorge. 
The main computational application of machine methods has been in 
connection with correlation and factor analysis (see the sections so 
named above). Bloom and Lubin (8) described applications of the IBM 
test-scoring machine for statistical work, including correlation. Simon 
(123) showed how, by a single run thru the IBM test-scoring machine, six 
part-scores from one side of a standard IBM answer sheet may be ob- 
tained. DuBois (35) described the use of the IBM counting sorter for 
computing the mean and standard deviation of two-digit variables, and 
computing the correlation between variables whose scores have been 
coded in a single column each of the Hollerith card. McQuitty (87) and 
Watkins (150) outlined the machine procedures for such tasks as the 
preparation of class rolls, recording absences, scoring of test papers, 
preparation of frequency distributions, and posting of grades. 


Statistics of the Individual Case 


Measurements of the individual are the basic source of most educational 
and psychological statistics, yet statistics has conferred little attention 
on the individual as such. The studies mentioned in this section have a 
closer relation to the individual case than do most others. Trimble and 
Cronbach (142) devised a graphic procedure by which a pupil’s growth 
(i.e., the difference between initial and final scores) may be checked 
for reliability and compared with the average growth of a group. Garrett 
(47) pointed out the conditions for a meaningful classification of a 
given case into one of two groups. The measure of profile-reliability sup- 
plied by Edgerton, Bordin, and Molish (38) is of value in the study of the 
individual case; so, too, is Ezekiel’s careful presentation of the error of an 
individual forecast (42, Chapter 19). 

Two studies on interpretation of the AQ (54, 68) deserve mention in 
this section. Three studies have concerned themselves explicitly with the 
variation among abilities of the individual: Preston (103) provided an 
analytic technic for determining average variability of the individual in a 
set of traits for which intercorrelations have been calculated; Clarke (22) 
proposed a coefficient of “ubiquity” or inconsistency, to measure varia- 
tion of the individual’s performance within a single ability-field; and 


Primoff (104) developed a method for factor analysis of the abilities of 
the individual. 


Sampling 


General—A clear discussion of many recent developments in sampling, 
together with important new formulas, has been given by McNemar (86). 
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Other useful presentations, somewhat more recent, are the mimeographed 
brochure by Treloar (141) and—for the mathematically trained reader— 
the survey of recent progress by Camp (16), and the lithoprinted book 
by Wilks (154). 

Methods of sampling—Peatman and Schafer (96) presented a new 
table of random numbers, together with illustrations of its use for random 
sampling. In a paper which brings out the distinctions between stratified, 
random, and “haphazard or fortuitous” sampling, Lindquist (80) urged 
the superiority of stratified sampling, for the avoidance of systematic 
errors. Anderson (2), in a mathematical paper, showed that the mean 
and standard deviation obtained by stratified sampling have less variable 
and less skewed sampling-distributions than are given by random sampling. 
Osborne (95), while indicating some practical and theoretical handicaps 
of stratified sampling, emphasized that samples systematically selected 
could yield results far more accurate than random samples of equivalent 
size. 

Formulas, tables, and graphs—Several aids for the interpretation of 
percentages have appeared: Burr and Hobson (12) outlined a quick way 
to determine the statistical reliability of the difference between two per- 
centages, when both percentages are derived from samples of equal size; 
Wilks (153) presented three charts to facilitate the determination of 
confidence limits; and Stephan (127) gave the formula for the standard 
error of a weighted proportion or percentage. A method for determining 
the standard error of any percentile (including percentiles in a subgroup 
of a stratified sample) was developed by Evans (41). Festinger (43) 
derived a formula for the reliability of the difference between means from 
highly skewed distributions. Tables to speed the calculation of the stand- 
ard error of tetrachoric r were provided both by Guilford and Lyons (56) 
and Hayes (59). Merrington (88) prepared new and fuller tables of 
the distribution of t for various levels of P. The problem of estimating 
the mean of a population when the sampling-units are of unequal size 
(e.g., schools with varying enrolments) was attacked by Cochran (24). 
A highly general solution of the problem of whether two samples might 
have been drawn from the same population was presented by Mathison 
(84); but no statement was given as to the comparative power or 
sensitivity of the statistical test employed. 

Small samples and “exact” test—Both Ezekiel (42: 24) and Treloar 
(141: 29) have emphasized the practical undependability of the t-test 
for the difference between means, when the samples are small. Treloar 
(141: 3) has also emphasized that a small sample fails to supply de- 
pendable information concerning the form of the parent-distribution; such 
knowledge is essential in any properly termed “exact” test. 

Unsettled issues and criticisms—(a) “Hypotheses,” null and otherwise— 
Modern sampling-theory is concerned with two types of errors: Type I, 
the risk of rejecting a hypothesis when true; and Type II, the risk of ac- 
cepting a hypothesis when false (154: 152). The inadequacy of the 
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null hypothesis by itself is now generally recognized; Berkson (6) has 
gone further and questioned the general logic of this hypothesis. Berkson’s 
objections to the null hypothesis extend to “the frequency chi-square test 
and many applications of the analysis of variance” (6: 243). (b) Defini- 

tion of parent-population—In interpreting the difference between (e.g.) 

M, and M,, it is possible to ask if the value of x and y could have come 
either from (i) the same population, or (ii) from two populations with 
the same mean (141: 2). The former is the mathematically more con- 
venient formulation, but the latter seems the more pertinent, if one’s in- 
terest extends specifically to the means. The two formulations do not 
necessarily lead to the same answer. (c) Choice of critical region— 
As Wilks said (154: 152), any test of a statistical hypothesis consists in the 
choice of a “critical region,” and the determination of the probability (on 
the hypothesis being tested) that values of the statistical measure in 
question will fall within this critical region. Since the choice of an 
optimum critical region is, to some extent, discretionary, one can readily 
understand the occurrence of “actual situations in which . . . when the 
null hypothesis was tested by several tests, each put forward with equally 
good authority, the P’s were considerably different” (6: 242). In this 
connection, Mises’ paper (146) on a criterion for the choice of statistical 
tests is of interest. An example of disagreement in the application of 
statistical tests is Kelley’s (72) reworking of conclusions reached by an- 
other author. Another example is the proposal, by Bonnier (9), to 
employ a system for testing heterogeneity (or correlation) in a four- 
fold table by a system which would yield “safer” but lower estimates of 
significance than are generally obtained. As Wilks has mildly remarked 
on the choice of critical regions, “there is still much work to be done” 
(154: 156). An important consideration here is the need for flexible defi- 
nition or application of the critical space, according as errors of Type I 
or Type II are, in the particular practical problem, considered more im- 
portant or less important, respectively. (d) The assumption of pure 
ignorance—Statistical tests of hypotheses assume that the information 
provided by the particular sample is the only information available. This 
is generally far from being the case. 












































Grouping and the Chi-Square Test 


Pointing out the effect of differences in grouping upon the chi-square 
test of goodness of fit between an observed and hypothetical distribution, 
Gumbel (57) recommended the use of intervals containing equal frequen- 
cies. In a different approach, Mann and Wald (82) derived a formula by 


which the optimum number of intervals (of equal length) could be de- 
termined. 


Matehed-Group Technic 


Koenker and Hansen (74) set forth a detailed numerical illustration 
of the steps in the Johnson-Neyman technic for evaluating a difference in 
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final scores of matched groups. Less elaborate yet highly serviceable 
technics have been provided by Peters (97) and by Shen (122); a solu- 
tion by application of analysis of variance was presented by Engelhart 
(40). Thorndike (135) considered the possible role of regression effects jn 
the matching of groups. 


Analysis of Variance 


In a helpful article Baxter (5) rephrased the technical vocabulary o{ 
analysis of variance in terms of psychological concepts and methods. 
Garrett (45) discussed some of the advantages of analysis of variance in 
psychological research, while Garrett and Zubin (48) gave a general 
review of the applications of analysis of variance. The use of the Latin 
square in the design of educational experiments was discussed and illus- 
trated by Thomson (134). Taylor (131) emphasized the value of analy <i: 
of interactions. 

Technical papers on the technic of analysis of variance include an 
empirical study by Godard and Lindquist (50) evaluating the importance 
of homogeneity of within-groups variance; a paper by Baxter (4) con- 
cerning the influence of errors of measurement; three papers on the 
adaptation of procedure to unequal numbers of observations in the su)- 
classes (20, 25, 143) ; a mathematical report by Wald (147) on measuring 
the efficiency of design of a statistical investigation; and a paper by Snede- 
cor and King (124) on the relation of cost to the factorial design of ex- 
periments. 

The equivalence or relation of analysis of variance to other statistical 
technics has been observed in several papers. Engelhart (40) and Rulon 
(114) both pointed out the equivalence between the t-test or “critical 
ratio” and the-z- or F-test in analysis of variance. At several points Gar- 
rett and Zubin (48) indicate the similarity between correlation and 
analysis of variance. Both Peters and VanVoorhis (99) and Treloar (141) 
have pointed out the easy transition from analysis of variance to the cor- 
relation ratio. 


Some Basic Theoretical Issues 


Three studies (21, 107, 132) gave systematic, critical consideration to 
the question whether psychological and educational scores represent true 
“measurement.” In a discussion of the logic of age-scales, Richardson 
recommended that the age-scale technic “be abolished in its entirety” 
(109: 34); a response to Richardson was made by McNemar (85). 
Sargent (116) emphasized the inadequacy of ability-measurement which 
fails to take into account differences in methods of work. Roff (111) indi- 
cated the role of item-difficulty in determining the apparent distribution 
of ability within a given group. 
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Miscellaneous 


The Marchant Calculating Machine Company (83) published a simple 
table to facilitate the calculation of square roots to five or more decimals. 
Croxton (31) urged the importance of standardized symbols for basic 
statistical concepts. Mottley and Embody (91) called attention to the 
importance of keeping the statistical principles employed in research open 
to public inspection and understanding. 


Trends 


Probably the two most active areas of research in statistical method 
today are the fields of sampling and factor analysis. A distinct gain in 
sampling theory is the recent weakening of the exclusive sway of the “null 
hypothesis.” The work in sampling is resulting in improved technics of 
sampling, a wider assortment of formulas from which the experimenter 
may “pick and choose,” and a better appreciation of underlying statistical 
assumptions. In factor analysis, there appears to be developing a better 
understanding both of the similarities and the differences among various 
methods of analysis. 

A “statistics of the individual” appears to be in process of formation. 
Evidence for this is the development of “inverted” factor analysis (corre- 
lation of persons instead of tests) and the studies listed in the appropriate 
section of the present review. 

The analysis of variance and the discriminant-function technic are 
both enjoying an increasing appreciation and wider use. 

What may be termed the “mechanics” of statistics is, in some instances, 
being substantially assisted by the development of specially convenient 
or economical formulas, and the increasing application of machines to 
scoring, tabulation, and calculation. Other changes include a tendency 
toward the freer use of small samples, and the freer use of biserial and 
tetrachoric r; these changes, however, gain convenience and economy 
only at the sacrifice of statistical reliability. 

Needs 

A long list of needs could readily be compiled; we shall limit comment 
to five main heads: 

Need for collation—There is serious need for the collation of various 
technics and approaches. Thus, the need for caution in interpretation of 
causation, and the role of errors of measurement receive careful attention 
in correlation; but analysis of variance is little concerned with either of 
these. On the other hand, analysis of variance stresses the role of com- 
bination-effects (“interaction”) and the importance of “design” in statis- 
tical studies; correlation and factor analysis tend to ignore both of these 
matters. Clinical method emphasizes individual differences in the rela- 
tionships among variables; correlation, factor analysis, and analysis of 
variance all appear satisfied simply with the average relationship. Other 
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examples of the need for collation may be found in sampling theory 
(e.g., chi-square versus other tests of significance), in the measurement 
of relationship (contingency method versus correlation versus analysis 
of variance), in item analysis, and in factor analysis. 

Estimation of population parameters—The typical statistical worker 
wishes primarily to improve his estimate of the population parameter, 
rather than to arrive at a theoretically “exact” value for the probable error 
of his estimate. Many workers feel that their purpose is better served by 
study of the statistical constants for repeated small samples than by a 
calculation based on only one sample with equivalent n. The truth or 
falsity of this viewpoint deserves careful investigation, both theoretical 
and empirical. 

Statistics of the individual—Study of the interrelations and implications 
of the individual’s scores has usually been part science and part art. Some 
assistance could doubtless be rendered here by a well-developed statistical 
system centered principally about the individual case, rather than the total 
group. 

Machine technics—Machine technics have not been an unmixed blessing. 
The IBM test-scoring machine, for example, has thrown out the short- 
answer or completion type of test item. Use of the Hollerith machines 
usually eliminates the scatter-diagram for a correlation. It is time for edu- 
cational measurement and statistics to demand and obtain machines which 
will facilitate the best available procedures, rather than sidetrack or 
eliminate them. 

Wasteful empiricism—The weights yielded by multiple correlation, the 
discriminant function, and item analysis should be used not merely to 
satisfy the needs of the immediate problem, but also to gain greater insight 
into the nature of the test- and criterion-elements. 

Finally one should point out that, for the period following the review 
by Flanagan, the present review should be supplemented by consultation 
of the brief review by Fattu, and the specialized review by Lorge. Ap- 
parent omissions from the present review can generally be found by 
reference to one of the others. 
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